monarch-initiative / mondo

Mondo Disease Ontology
http://obofoundry.org/ontology/mondo
Creative Commons Attribution 4.0 International
225 stars 53 forks source link

Add a QC check to prevent OMIM SubClassOf OMIM #732

Closed cmungall closed 2 years ago

cmungall commented 5 years ago

See #730

In general OMIM IDs be equivalenced only to "gene-level" disease classes. This needs to be written up in the docs and a check added. We should also consistently use subset tags to denote levels in the ontology.

The gene-level metaclass is classes that are defined according to a single gene

If we see a case where we infer one OMIM to be subclass of another this is likely a mistake, see docs on the "prototype" problem (in which a disease "Foo" is later split into "Foo 1" and "Foo 2" with the label "Foo" being ambiguous wrt whether it denotes the "classic" form "Foo 1" or a superclass).

There are cases where some OMIM IDs may still represent something above the gene level, and this is also tied in with susceptibility

maglott commented 4 years ago

@cmungall , is this going to implemented globally? I just encountered https://monarchinitiative.org/disease/MONDO:0008947 where bilateral striopallidodentate calcinosis has a synonym of basal ganglia calcification, idiopathic, type 1 which is gene-specific. basal ganglia calcification, idiopathic, 1 MONDO:0024538 differs only by the term 'type'

maglott commented 4 years ago

And Trichohepatoenteric syndrome type 1 a synonym of tricho-hepato-enteric syndrome MONDO:0009105 when trichohepatoenteric syndrome 1 MONDO:0024541 also exists (that pesky 'type')

matentzn commented 3 years ago

@nicolevasilevsky Here is an up-to-date list: https://docs.google.com/spreadsheets/d/1CJPJ8VasB-kGaUqY0jBBvFf2IwW5xbd3ZiZazx67eHo/edit?usp=sharing

Let me know how you want to go about this.

matentzn commented 3 years ago

FYI, This is the check:

SELECT ?term ?term_label ?p ?pn ?xc ?xp {
   ?term rdfs:subClassOf ?p .
   ?exp owl:annotatedSource ?term ;
        owl:annotatedProperty oboInOwl:hasDbXref ;
        owl:annotatedTarget ?xref;
        oboInOwl:source ?source .
   ?exp_p owl:annotatedSource ?p ;
       owl:annotatedProperty oboInOwl:hasDbXref ;
       owl:annotatedTarget ?xref_p;
       oboInOwl:source ?source_p .
   ?term rdfs:label ?term_label .
   ?p rdfs:label ?pn .
   FILTER (isIRI(?term) && regex(str(?term), "^http://purl.obolibrary.org/obo/MONDO_"))
   FILTER (isIRI(?p) && regex(str(?p), "^http://purl.obolibrary.org/obo/MONDO_"))
   FILTER(regex(str(?xref), "^OMIM:"))
   FILTER(regex(str(?xref_p), "^OMIM:"))
   FILTER(str(?source)="MONDO:equivalentTo")
   FILTER(str(?source_p)="MONDO:equivalentTo")
}

I eyeballed some and some of these equivalencies come from Orphanet.

matentzn commented 3 years ago

(The important thing here is to identify a systematic mistake we can just break in one go, or just keep going through the table until its done)

nicolevasilevsky commented 3 years ago

I'll take a look at this and think about it.

Some of these are being addressed in this ticket: https://github.com/monarch-initiative/mondo/issues/962

nicolevasilevsky commented 3 years ago

There are some cases where terms like this (https://omim.org/entry/109100) are used as grouping classes, which I don't necessarily think is wrong, but it is going to flag errors for this check.

nicolevasilevsky commented 3 years ago

And Trichohepatoenteric syndrome type 1 a synonym of tricho-hepato-enteric syndrome MONDO:0009105 when trichohepatoenteric syndrome 1 MONDO:0024541 also exists (that pesky 'type')

The other example above from @maglott has been resolved. (Thanks for your comments @maglott!)

nicolevasilevsky commented 2 years ago

I think this is done. Please reopen if further action is needed.