monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

exclude non-diseases from OMIM slurp #110

Closed nicolevasilevsky closed 1 year ago

nicolevasilevsky commented 1 year ago

About

https://github.com/monarch-initiative/mondo-ingest/pull/108

⬆️ The OMIM slurp includes a lot of non-diseases and it didn't used to. @sabrinatoro mentioned that certain types of terms (like those in brackets) are not diseases. The non-diseases need to be either manually reviewed and excluded or maybe there is a way to bulk exclude them.

We discussed this on the tech call on 11/18/22

Details

This can be done simply by adding terms to the exclusion tables.

Is there any way to tag them or group them in OMIM? Or should I just add them extensionally to the exclusions?

How to tell what are non-diseases? Look at this article https://www.omim.org/help/faq#1_3 and include the ones that have a marking that indicates 'disease' or 'phenotype'?

I think I should make a PR for my addition to these terms in the exclusion table, and have someone else review, or is this something more appropriate for Sabrina? Or has she already done it?

matentzn commented 1 year ago

@sabrinatoro can you roughly group the existing non-disease terms into categories (like anatomy, etc) and figure out if there is a way to distinguish them from phenotypes in OMIM somehow? Then the goal here is to use this information for @joeflack4 to add superclass axioms like OMIM:blood subclass Of UBERON:AnatomicalEntity. Then we can exclude them in the exclusion system.

sabrinatoro commented 1 year ago

There are a few for which I would like Nicole's input, but I have reviewed the omim list and will update the exclusion list. A few things to keep in mind: 1) we do have the code MONDO:excludeNonDisease that we can use as a blanket exclusion code if we don't have the time to review 2) @matentzn The terms are grouped for non-disease (based on exclusion code). However, I don't think it is that simple, and I don't know that this will help us much. For example, the blood types are excluded because they are traits. I don't think you will find them in UBERON, but also, I don't know how you would know which omim terms are anatomy (they don't have any information for this). Also, MONDO:excludedgene includes more than "just" genes: it includes pseudogenes, proteins, locus. I don't know that all of these can be caught bases on what is in the omim files (which I am reviewing right now, so we can make more sense of them)

matentzn commented 1 year ago

This is now highest priority for you @joeflack4 because we changed curation strategy