reimandlab / ActiveDriverDB

ActiveDriverDB
GNU Lesser General Public License v2.1
12 stars 3 forks source link

adding stats to website #34

Closed reimand0 closed 8 years ago

reimand0 commented 8 years ago

Hi Michal,

how about adding some numbers to the front page of the web site as an overview of the data we have

this will help viewers understand the impressive scope and amount of data we analyse.

krassowski commented 8 years ago

Because the server had troubles to get up and work normally when count of mappings (all DNA>protein table) were calculated on runtime, I hardcoded it so it wont be an issue. All other statistics are also implemented and shown on front page, so I am closing this for now.

reimand0 commented 8 years ago

Looks like the server is much faster today. I noticed this while changing mutation datasets within a given protein (TCGA > ClinVar > ESP).

Some tiny changes for the stats view:

Proteins
39159

this should have two numbers - total unique human proteins, total protein isoforms

Mutations
All: 2559719

ClinVar: 181457

TCGA: 468115

ESP 6500: 1319413

1K Genomes: 1071901

lets call this 1000 Genomes

MIMP annotated potential mutations
24141717

this is not needed

Sites
385185

PTM sites

Kinases 607

kinases and other PTM enzymes (by the way, do we have proteins associated to acetylation sites?)

Kinase Groups
148

kinase families

Confirmed mutations in PTM sites 365097

remove "confirmed"

Confirmed mutations with MIMP annotations
50678

Mutations network-rewiring effect (losses and gains of sequence motifs)

Mutation annotations (all DNA>protein table + MIMP annotations) 97235488

Total number of annotated nucleotides

reimand0 commented 8 years ago

Maybe the last one should be "Total number of annotations of nucleotides". what do you think?

reimand0 commented 8 years ago

Also, is there a script that updates these statistics into a file that can be included then as hardcoded into the view?

krassowski commented 8 years ago
krassowski commented 8 years ago

When it comes to proteins associated with acetylation sites:

from app import app
from models import Site
from sqlalchemy.core import and_
sites = Site.query.filter(
    and_(Site.kinases.any(), Site.type.like('%acetylation%'))
).all()
print(len(sites))

prints 324, so there are 324 acetylation sites with at least one kinase associated.

proteins_acssociated_with_acetylation_sites = set()
for site in sites:
    for kinase in site.kinases:
        if kinase.protein:
            proteins_acssociated_with_acetylation_sites.add(kinase.protein)

and it gives us 27 proteins associated with acetylation sites

{<Protein NM_012231 with seq of 1719 aa from PRDM2 gene>,
 <Protein NM_005030 with seq of 604 aa from PLK1 gene>,
 <Protein NM_002758 with seq of 335 aa from MAP2K6 gene>,
 <Protein NM_001514 with seq of 317 aa from GTF2B gene>,
 <Protein NM_145331 with seq of 607 aa from MAP3K7 gene>,
 <Protein NM_003884 with seq of 833 aa from KAT2B gene>,
 <Protein NM_001145415 with seq of 1292 aa from SETDB1 gene>,
 <Protein NM_002392 with seq of 498 aa from MDM2 gene>,
 <Protein NM_004424 with seq of 785 aa from E4F1 gene>,
 <Protein NM_003491 with seq of 236 aa from NAA10 gene>,
 <Protein NM_182710 with seq of 547 aa from KAT5 gene>,
 <Protein NM_001429 with seq of 2415 aa from EP300 gene>,
 <Protein NM_001282166 with seq of 424 aa from SUV39H1 gene>,
 <Protein NM_004380 with seq of 2443 aa from CREBBP gene>,
 <Protein NM_020197 with seq of 434 aa from SMYD2 gene>,
 <Protein NM_003642 with seq of 420 aa from HAT1 gene>,
 <Protein NM_030662 with seq of 401 aa from MAP2K2 gene>,
 <Protein NM_021078 with seq of 838 aa from KAT2A gene>,
 <Protein NM_000551 with seq of 214 aa from VHL gene>,
 <Protein NM_006709 with seq of 1211 aa from EHMT2 gene>,
 <Protein NM_001880 with seq of 506 aa from ATF2 gene>,
 <Protein NM_005923 with seq of 1375 aa from MAP3K5 gene>,
 <Protein NM_002613 with seq of 557 aa from PDPK1 gene>,
 <Protein NM_005204 with seq of 468 aa from MAP3K8 gene>,
 <Protein NM_004333 with seq of 767 aa from BRAF gene>,
 <Protein NM_001278549 with seq of 457 aa from PDK1 gene>,
 <Protein NM_030648 with seq of 367 aa from SETD7 gene>}

So if everything is correct, yes we have some proteins associated with accetylation sites.