monarch-initiative / mondo

Mondo Disease Ontology
http://obofoundry.org/ontology/mondo
Creative Commons Attribution 4.0 International
234 stars 53 forks source link

Add QC check to determine if grouping class is classified differently from all its leaf nodes #4920

Open cmungall opened 2 years ago

cmungall commented 2 years ago

example: primary bone dysplasia should likely be classified as Mendelian

this has many descendants:

runoak -i db/mondo.db descendants MONDO:0018230 -p i | wc
     444    2569   23922

most of these are in OMIM:

runoak -i db/mondo.db descendants MONDO:0018230 -p i -D x | grep -v -c OMIM
49

and in fact the majority of these are either parents of OMIMs or something that likely should be a genetic disease:

✗ runoak -i db/mondo.db descendants MONDO:0018230 -p i -D x | grep -v OMIM | head -20
MONDO:0000138 ! metaphyseal chondrodysplasia HP:0005871 SCTID:28681006
MONDO:0015907 ! epimetaphyseal skeletal dysplasia GARD:0002176 ICD10CM:Q77.8 Orphanet:1819
MONDO:0015985 ! bone dysplasia, Azouz type GARD:0000920 ICD10CM:Q78.4 Orphanet:1844 SCTID:720566004 UMLS:C4303993
MONDO:0018230 ! primary bone dysplasia Orphanet:364526
MONDO:0018233 ! otopalatodigital syndrome spectrum disorder DOID:0111782 Orphanet:364541 UMLS:C2748918
MONDO:0018254 ! spondyloepimetaphyseal dysplasia, Isidor type ICD10CM:Q77.8 Orphanet:370015
MONDO:0018255 ! spondylometaphyseal dysplasia, Czarny-Ratajczak type ICD10CM:Q77.8 Orphanet:370019
MONDO:0018490 ! cono-spondylar dysplasia ICD10CM:Q77.7 Orphanet:420794 SCTID:766874001 UMLS:CN237491
MONDO:0019692 ! multiple epiphyseal dysplasia and pseudoachondroplasia ICD10CM:Q78.8 Orphanet:93429
MONDO:0019693 ! multiple metaphyseal dysplasia ICD10CM:Q78.5 Orphanet:93430
MONDO:0019694 ! spondylodysplastic dysplasia Orphanet:93434
MONDO:0019695 ! acromelic dysplasia ICD10CM:Q74.8 Orphanet:93436
MONDO:0019697 ! mesomelic and rhizo-mesomelic dysplasia Orphanet:93438 UMLS:CN229208
MONDO:0019698 ! bent bone dysplasia ICD9:756.59 Orphanet:93439 SCTID:254095002 UMLS:C0432238
MONDO:0019699 ! slender bone dysplasia Orphanet:93440

We should add a QC check: if a grouping is not under inherited yet the majority of the leaf nodes are, then flag for checking

Here majority can be something chosen by curator, maybe 90% or maybe it should be more of a statistical test, we can refine later

A note on general applicability to OBO: this kind of inductive reasoning step is a good counterpart to the deductive reasoning we do

matentzn commented 2 years ago

The idea is good, but we are still lacking a consistent framework for implementing such checks. SPARQL is very cumbersome, especially if we want to generalise this to:

for all A in O, CT_SUBCLASS = SUM(B sub A) PARENTS_OF_CHILDREN = {C: SUM(C super (B sub A))/CT_SUBCLASS } if PARENTS_OF_CHILDREN.PREVALENCE > 0.75: suggest A subClassOf PARENTS_OF_CHILDREN.CLASS

Should OAK have a better interface for specifying tests like this, or should we just keep adding stand-alone python scripts - especially if we want to resuse stuff like this?