Closed reimand0 closed 8 years ago
Because the server had troubles to get up and work normally when count of mappings (all DNA>protein table) were calculated on runtime, I hardcoded it so it wont be an issue. All other statistics are also implemented and shown on front page, so I am closing this for now.
Looks like the server is much faster today. I noticed this while changing mutation datasets within a given protein (TCGA > ClinVar > ESP).
Some tiny changes for the stats view:
Proteins
39159
Mutations
All: 2559719
ClinVar: 181457
TCGA: 468115
ESP 6500: 1319413
1K Genomes: 1071901
MIMP annotated potential mutations
24141717
Sites
385185
Kinases 607
Kinase Groups
148
Confirmed mutations in PTM sites 365097
Confirmed mutations with MIMP annotations
50678
Mutation annotations (all DNA>protein table + MIMP annotations) 97235488
Maybe the last one should be "Total number of annotations of nucleotides". what do you think?
Also, is there a script that updates these statistics into a file that can be included then as hardcoded into the view?
statistics.py
file and it is explained there in details.When it comes to proteins associated with acetylation sites:
from app import app
from models import Site
from sqlalchemy.core import and_
sites = Site.query.filter(
and_(Site.kinases.any(), Site.type.like('%acetylation%'))
).all()
print(len(sites))
prints 324, so there are 324 acetylation sites with at least one kinase associated.
proteins_acssociated_with_acetylation_sites = set()
for site in sites:
for kinase in site.kinases:
if kinase.protein:
proteins_acssociated_with_acetylation_sites.add(kinase.protein)
and it gives us 27 proteins associated with acetylation sites
{<Protein NM_012231 with seq of 1719 aa from PRDM2 gene>,
<Protein NM_005030 with seq of 604 aa from PLK1 gene>,
<Protein NM_002758 with seq of 335 aa from MAP2K6 gene>,
<Protein NM_001514 with seq of 317 aa from GTF2B gene>,
<Protein NM_145331 with seq of 607 aa from MAP3K7 gene>,
<Protein NM_003884 with seq of 833 aa from KAT2B gene>,
<Protein NM_001145415 with seq of 1292 aa from SETDB1 gene>,
<Protein NM_002392 with seq of 498 aa from MDM2 gene>,
<Protein NM_004424 with seq of 785 aa from E4F1 gene>,
<Protein NM_003491 with seq of 236 aa from NAA10 gene>,
<Protein NM_182710 with seq of 547 aa from KAT5 gene>,
<Protein NM_001429 with seq of 2415 aa from EP300 gene>,
<Protein NM_001282166 with seq of 424 aa from SUV39H1 gene>,
<Protein NM_004380 with seq of 2443 aa from CREBBP gene>,
<Protein NM_020197 with seq of 434 aa from SMYD2 gene>,
<Protein NM_003642 with seq of 420 aa from HAT1 gene>,
<Protein NM_030662 with seq of 401 aa from MAP2K2 gene>,
<Protein NM_021078 with seq of 838 aa from KAT2A gene>,
<Protein NM_000551 with seq of 214 aa from VHL gene>,
<Protein NM_006709 with seq of 1211 aa from EHMT2 gene>,
<Protein NM_001880 with seq of 506 aa from ATF2 gene>,
<Protein NM_005923 with seq of 1375 aa from MAP3K5 gene>,
<Protein NM_002613 with seq of 557 aa from PDPK1 gene>,
<Protein NM_005204 with seq of 468 aa from MAP3K8 gene>,
<Protein NM_004333 with seq of 767 aa from BRAF gene>,
<Protein NM_001278549 with seq of 457 aa from PDK1 gene>,
<Protein NM_030648 with seq of 367 aa from SETD7 gene>}
So if everything is correct, yes we have some proteins associated with accetylation sites.
Hi Michal,
how about adding some numbers to the front page of the web site as an overview of the data we have
this will help viewers understand the impressive scope and amount of data we analyse.