monarch-initiative / monarch-disease-ontology-RETIRED

THIS IS THE OLD REPO: Use this one instead: https://github.com/monarch-initiative/mondo-build
https://github.com/monarch-initiative/mondo-build
17 stars 9 forks source link

Resolve linkages for Ehlers-Danlos type 1 #74

Closed cmungall closed 8 years ago

cmungall commented 8 years ago

Note: this ticket is intended primarily to document the curation procedure for the new kboom-based mondo. The curation methodology is to look at all sub-module solutions in which the graph with the most likely posterior probability involves the selection of edges with a very low prior probability.

This example involves ED-type 1 (this turns up the top of the 'suspect' list). With the current priors the most likely graph is:

img-mesh_c536194

Note that this graph was chosen out of all possible configurations (where we treat each mapping as sub/super/equiv/wrong) as the most likely. However, the most likely involves an edge with a low prior:

this has a low prior (Pr=0.028) because lexically we really expect these to be equivalent. Why was this configuration chosen then? In part because other priors are derived from rules such as the fact we don't usually classify things under OMIMs, and possibly some lexical misleadingness too. Overall the graph has a low probability and a low confidence, which in the current report looks like this:

If the confidence is high, or if the most likely sub-ontology did not involve selecting an edge with a low prior probability there is no urgency for the curator to vet. However, in this case, the prior is exceedingly low so we can look at this and decide what to do next.

There are a variety of small nudges that will push the most likely configuration into being the correct one:

I will illustrate this in the next commit.

cmungall commented 8 years ago

Another global fix would be to add an extra level of distrust for RELATED synonyms from MESH and OMIM. We have a rule that lexical substring (weakly) suggests subclassOf. If we look at some of the synonyms we can see this is ripe for false positives:

[Term]
id: OMIM:130000
name: Ehlers-Danlos Syndrome, Classic Type
synonym: "EHLERS-DANLOS SYNDROME, CLASSIC TYPE" RELATED []
synonym: "Eds I, Formerly" RELATED []
synonym: "Eds Ii, Formerly" RELATED []
synonym: "Ehlers Danlos Syndrome, Mild Classic Type, Formerly" RELATED []
synonym: "Ehlers Danlos Syndrome, Mitis Type, Formerly" RELATED []
synonym: "Ehlers-Danlos Syndrome, Gravis Type, Formerly" RELATED []
synonym: "Ehlers-Danlos Syndrome, Severe Classic Type, Formerly" RELATED []
synonym: "Ehlers-Danlos Syndrome, Type I, Formerly" RELATED []
synonym: "Ehlers-Danlos Syndrome, Type Ii, Formerly" RELATED []
xref: Orphanet:287
xref: Orphanet:90309
xref: UMLS:C0268335

[Term]
id: MESH:C536194
name: Ehlers-Danlos syndrome type 1
namespace: CTD_disease_ontology
alt_id: OMIM:130000
synonym: "EDS I, FORMERLY" RELATED []
synonym: "EDS II" RELATED []
synonym: "EDS1, FORMERLY" RELATED []
synonym: "EDS2, FORMERLY" RELATED []
synonym: "EHLERS DANLOS SYNDROME, MILD CLASSIC TYPE" RELATED []
synonym: "EHLERS DANLOS SYNDROME, MITIS TYPE, FORMERLY" RELATED []
synonym: "EHLERS-DANLOS SYNDROME, CLASSIC TYPE" RELATED []
synonym: "EHLERS-DANLOS SYNDROME, GRAVIS TYPE, FORMERLY" RELATED []
synonym: "EHLERS-DANLOS SYNDROME, SEVERE CLASSIC TYPE, FORMERLY" RELATED []
synonym: "EHLERS-DANLOS SYNDROME, TYPE I, FORMERLY" RELATED []
synonym: "EHLERS-DANLOS SYNDROME, TYPE II, FORMERLY" RELATED []
synonym: "Ehlers-Danlos Syndrome, Severe Classic Type" RELATED []
synonym: "Ehlers-Danlos Syndrome, Type I" RELATED []
synonym: "Ehlers-Danlos syndrome, Gravis type" RELATED []
synonym: "Ehlers-Danlos syndrome, classic severe form" RELATED []
synonym: "FORMERLY" RELATED []
is_a: MESH:D004535  ! Ehlers-Danlos Syndrome

Of course, lexical methods will always be flawed, the point of the bayesian method here is to weight them appropriately. In this particular case it seems we have a 'perfect storm' of misleading prior probabilities. In theory this should be fixed with one small tweak of one of the priors...

cmungall commented 8 years ago

The (deliberately) very minor tweak is overriding a single prior, in https://github.com/monarch-initiative/monarch-disease-ontology/commit/1b24d76776783744a9d7bb142314a82b8de7edd9

I will check the results in the a.m

cmungall commented 8 years ago

This resolves the issue:

img-mesh_c536194

Note how it still really doesn't like solutions that place diseases underneath an OMIM. We can look at the priors here, but this is perhaps a wider discussion

cmungall commented 8 years ago

And for completeness, here is the full classic-type subhierarchy:

edct