monarch-initiative / mondo

Mondo Disease Ontology
http://obofoundry.org/ontology/mondo
Creative Commons Attribution 4.0 International
225 stars 53 forks source link

Consider annotation prop to distinguish disease grouping class vs disease entity #685

Open kshefchek opened 5 years ago

kshefchek commented 5 years ago

It would be useful if MONDO included a way to distinguish between a disease entity and a disease grouping. Currently the only way to do this is to check for subclasses/types of a disease; however, this results in missing valid diseases (see Huntington disease and Cystic Fibrosis). We regularly add a check that all diseases with an OMIM equivalent are disease entities, but a more formal way to determine this would be useful for analyses and application views.

cmungall commented 5 years ago

Not sure there is an agreed upon 'level'. E.g. ORDO calls juvenile Huntigton a disease entity

But we are doing something like this anyway for the mondo-analysis

I think the CF subclass is an error can you make ann efo ticket @nicolevasilevsky

monicacecilia commented 5 years ago

Dear @nicolevasilevsky - did you hear back from EFO about this?

nicolevasilevsky commented 5 years ago

@monicacecilia I don't think I ever did this! Thanks for the nudge

nicolevasilevsky commented 4 years ago

per @paolaroncaglia's comments on https://github.com/EBISPOT/efo/issues/553#issuecomment-534927291, I revised the subclass for MONDO_0005413 cystic fibrosis associated meconium ileum

paolaroncaglia commented 4 years ago

@nicolevasilevsky MONDO:0009061 'cystic fibrosis' needs fixing too please, see https://github.com/EBISPOT/efo/issues/553#issuecomment-539991798 Thanks!

nicolevasilevsky commented 4 years ago

Got it - thanks @paolaroncaglia. I did one PR #882 I will work on the rest of this once @cmungall approves the PR.

Note to self, these are the action items from https://github.com/EBISPOT/efo/issues/553

pancreas disease Rare genetic respiratory disease rare male fertility disorder with obstructive azoospermia Rare genetic disorder with obstructive azoospermia rare pulmonary disease Genetic biliary tract disease Genetic pancreatic disease

kshefchek commented 4 years ago

Is the original request in scope or feasible to be included in mondo, or should we look for a workaround?

paolaroncaglia commented 4 years ago

Going back to MONDO:0005413 'cystic fibrosis associated meconium ileum', I'm afraid there's a typo in the label that you inherited from EFO, it should be "ileus" not "ileum". (See e.g. https://www.cysticfibrosisjournal.com/article/S1569-1993(17)30809-3/fulltext https://www.chop.edu/conditions-diseases/meconium-ileus https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3085752/)

And based on the above, I'd also suggest to make 'cystic fibrosis associated meconium ileus' a subclass of MONDO:0054868 'meconium ileus'

Thanks, Paola

paolaroncaglia commented 4 years ago

Hi @nicolevasilevsky , I'm afraid that part of this ticket got a bit side-tracked due to technical issues ;-) It would be great if you could please, before your next release, complete the edits agreed upon for 'cystic fibrosis', so EFO can resume mapping to Mondo for this term. Summing up:

cmungall commented 4 years ago

Would use of github milestones help with prioritizing tickets for releases?

Kent can you say more about the negative consequence of classifying CF as non leaf? I thought the distinction purely drove the kind of table displayed, to minimize repetition with leaf terms

On Wed, Nov 13, 2019, 05:28 paolaroncaglia notifications@github.com wrote:

Hi @nicolevasilevsky https://github.com/nicolevasilevsky , I'm afraid that part of this ticket got a bit side-tracked due to technical issues ;-) It would be great if you could please, before your next release, complete the edits agreed upon for 'cystic fibrosis', so EFO can resume mapping to Mondo. Summing up:

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/monarch-initiative/mondo/issues/685?email_source=notifications&email_token=AAAMMOICF7EOHZKAJ4M2IALQTP6H7A5CNFSM4HJNXBMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED6ENSI#issuecomment-553404105, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOO5W2ODPBRRGKABQLLQTP6H7ANCNFSM4HJNXBMA .

nicolevasilevsky commented 4 years ago

@kshefchek - see comment above

@cmungall yes- let's use GitHub milestones, great idea

nicolevasilevsky commented 4 years ago

@paolaroncaglia, I assigned this ticket to the December release milestone (which I just created).

Are you able to assign milestones too? Feel free to assign and help prioritize. (If not, I think I can adjust your permissions so you can do so)

nicolevasilevsky commented 4 years ago

@paolaroncaglia Is this an action item for Mondo, or just EFO? Remove the mapping between EFO:0004608 'cystic fibrosis' and its counterpart MONDO:0009061, and remove MONDO:0009061 from the Mondo list of terms, until Mondo fixes the same issue (see #685 (comment)), or we'll get the wrong parents back.

kshefchek commented 4 years ago

I think this would be useful for analysis and application views. Previously I have had to add extra checks (eg all OMIM identifiers are entities, for ex otherwise Angelman would be filtered out). This works well for rare disease but I assume there are cases in common disease as well. It's not complicated to get around this from my side of things so if it's complicated to encode in the ontology not a big deal.

paolaroncaglia commented 4 years ago

@nicolevasilevsky In reply to your comment "I assigned this ticket to the December release milestone (which I just created). Are you able to assign milestones too? Feel free to assign and help prioritize. (If not, I think I can adjust your permissions so you can do so)" I just tested unassigning the December milestone and assigning it again, so yes I can do that and I will try to remember to do it in the future :-) Thanks! Pinging @zoependlington so she's aware of this option for Mondo tickets she and I create.

paolaroncaglia commented 4 years ago

@nicolevasilevsky

Is this an action item for Mondo, or just EFO? Remove the mapping between EFO:0004608 'cystic fibrosis' and its counterpart MONDO:0009061, and remove MONDO:0009061 from the Mondo list of terms, until Mondo fixes the same issue (see #685 (comment)), or we'll get the wrong parents back.

It's just for EFO, thanks.

nicolevasilevsky commented 4 years ago

It's just for EFO, thanks.

got it, thanks!

Have I addressed on the issues on this ticket?

paolaroncaglia commented 4 years ago

@nicolevasilevsky

Have I addressed on the issues on this ticket?

I'm not sure if these ones are done, please: https://github.com/monarch-initiative/mondo/issues/685#issuecomment-552448298 Thanks and have a great weekend!

nicolevasilevsky commented 4 years ago

The label for MONDO_0005413 was fixed. image

I think that is everything. Please reopen if there are outstanding action items.

kshefchek commented 4 years ago

what is the final word on distinguishing disease entity vs grouping class?

nicolevasilevsky commented 4 years ago

@kshefchek not sure, I reopened the ticket. @cmungall can you comment?

cmungall commented 4 years ago

I need a definition of disease entity, or at least a full description of how the distinction would manifest computationally on the monarch site/apis

kshefchek commented 4 years ago

For example, I want to get all rare diseases in Mondo, I do a query to filter out any diseases with subclasses, this gives me Juvenile Huntington disease, but filters out Huntington disease. I assume that this is not expected, but I also don't really know, perhaps better if someone clinical weighed in, @pnrobinson?

cmungall commented 4 years ago

But why do you need to filter at all? What are the consequences of just saying: "this is an ontology, here are all subclasses"?

I can see some users may find it unsatisfying to have "neurodegenerative disease" in the list, as this is not what would be considered a distinct disease entity.

But what is a distinct disease entity? I am not sure you will get consistent answers. @pnrobinson suggests distinct treatments and we may be able to give answers that stratify this way as maxo matures, but we are not there yet.

Perhaps for now a strategy where we annotate what are unambiguously disease groupings, and this can be a negative filter. You will still have classes that subsume one another. Is that a problem.

If there is a high priority use case for having a single layer with no subsumption then we can explore that, using the ordo groupings as a basis, but this will take some engineering and curation for it to be complete. And it's not clear to me how things like cancers should be treated here.

For individual analyses there are other things you can do such as filter out by subset, e.g. etiological_subtype, and disease_grouping

cmungall commented 4 years ago

btw I want to caution that HD is not the bets example to illustrate your general point as it's a bit odd, we have a single-isa which is a bad smell

image

kshefchek commented 4 years ago

Is disease_grouping already a subset? This is essentially what I am asking for

cmungall commented 4 years ago

Yes, but it will not be complete.

On Fri, Nov 15, 2019 at 11:18 AM Kent Shefchek notifications@github.com wrote:

Is disease_grouping already a subset? This is essentially what I am asking for

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/monarch-initiative/mondo/issues/685?email_source=notifications&email_token=AAAMMONF4EMSGMSSOHHKBUTQT3YYPA5CNFSM4HJNXBMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEGOFFQ#issuecomment-554492566, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOIQNHC47ULXQ3QWDH3QT3YYPANCNFSM4HJNXBMA .

kshefchek commented 4 years ago

Perhaps for now a strategy where we annotate what are unambiguously disease groupings, and this can be a negative filter. You will still have classes that subsume one another. Is that a problem.

I think this would be perfect

cmungall commented 4 years ago

OK, plan

On Fri, Nov 15, 2019 at 11:29 AM Kent Shefchek notifications@github.com wrote:

Perhaps for now a strategy where we annotate what are unambiguously disease groupings, and this can be a negative filter. You will still have classes that subsume one another. Is that a problem.

I think this would be perfect

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/monarch-initiative/mondo/issues/685?email_source=notifications&email_token=AAAMMOMO4S46TLMV2YJ725TQT32ARA5CNFSM4HJNXBMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEGPCIQ#issuecomment-554496290, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOKO4V2EYZJMAAIJBEDQT32ARANCNFSM4HJNXBMA .

maglott commented 4 years ago

I don't think all OMIMPS qualify as disease_grouping if disease_grouping is restricted to terms that are not truly diseases. In fact, most OMIMPS correspond to non-gene-specific terms for a disease.

kshefchek commented 4 years ago

yes I was thinking anything with a direct {sequence_feature,phenotypic_feature} annotation is likely not a high level grouping, which would rule out phenotypic series classes.

pnrobinson commented 4 years ago

There is a spectrum of severity and age of onset in HD, and the category "juvenile" is an arbitrary distinction, but one that is used. There is literature with adult-onset and late-onset HD. Probably it has some utility to subdivide in this way, as it allows genotype-phenotype analysis, but in reality it is a spectrum. We should probably purchase recent editions of some of the standard text books in the field, and try to consult them on question like this -- Wikipedia and Google are not reliable. Assuming we get the Phenomics grant, and if the funding is not massively reduced, I could order a selection of books for such occasions.

pnrobinson commented 4 years ago

In answer to @kshefchek question, juvenile HD is a subset of HD, and so it is rarer, but even generic HD is a rare disease.

nicolevasilevsky commented 2 years ago

@kshefchek is this still needed?

kshefchek commented 2 years ago

We do a lot of analyses that requires us to select diseases vs disease groups (I'm working on one today), so I think this is still a nice to have. Is the issue with something like this accuracy? I wonder if we could distribute a separate file with the disclaimer that it may contain false positives/negatives?

nicolevasilevsky commented 2 years ago

I added it to our Monarch tech call agenda on Friday. I'll see what I can find out from Nico and Chris.

sabrinatoro commented 2 years ago

On the Mondo-QC call: To find the grouping terms, we could

This is not a priority for 2021.