monarch-initiative / mondo

Mondo Disease Ontology
http://obofoundry.org/ontology/mondo
Creative Commons Attribution 4.0 International
228 stars 53 forks source link

OMIM source of truth #1901

Closed matentzn closed 2 years ago

matentzn commented 4 years ago

We have now various pipelines that produce an omim.owl, and I was wondering how to consolidate.

1) dipper, which pulls from https://data.omim.org/downloads/ 2) disease2owl which pulls from the alpha (aka the unvetted bleeding edge stage of the dipper data pipeline acc. to @TomConlin): https://archive.monarchinitiative.org/alpha/rdf/omim.ttl 3) mondo which pulls from http://data.monarchinitiative.org/ttl/omim.ttl, which I think is the latest official data release location

Now all of this is a bit unsatisfactory. I believe the best way to do is

  1. Move everything to the Mondo repo (all 3 steps). That means mondo is responsible for generating the omim ontology, and it is removed from dipper permanently.
  2. The same exact omim.owl used in the slurp->merge workflow that adds new terms to Mondo is also used to generate the omim import in mondo
  3. Monarch gets OMIM only through the Mondo import (this is only safe if we can assume that there are no omim terms in the Monarch KG not in Mondo, +- a bit of lag refreshing Mondo).

@cmungall @kshefchek @TomConlin please let me know if this makes sense.

TomConlin commented 4 years ago

OMIM ids and their clasification are relevant in a handful of dipper ingests other than the OMIM ingest itself.

For MONDO to cover these cases we will need to replicate the function of:

https://github.com/monarch-initiative/dipper/blob/master/dipper/sources/OMIMSource.py#L10

which was my attempt to consolidate even more of that unsatisfying smear you are noting.

kshefchek commented 4 years ago

I think we could have a more sane separation - Mondo makes subClassOf, equivalence, synonym, labels for disease, dipper gets gene to disease and any other relevant a-box stuff (omim variant to clinvar variant, publications)

matentzn commented 4 years ago

Great, I agree with you both; to move this forward (also as a blueprint for monochrom, for which we should do the same), is the correct way to produce a stand alone python ingest script based on the dipper script that I just move to mondo/monochrom repo which will simply build and OWL ontology from source? Should I continue to use dipper, or would it make sense to build something entirely stand-alone based on the dipper scripts?

matentzn commented 3 years ago

https://ci.monarchinitiative.org/job/build-omim/ <- old dipper build

Running those: https://dipper.readthedocs.io/en/latest/dipper.sources.OMIM.html https://github.com/monarch-initiative/dipper/blob/master/dipper/sources/OMIM.py https://github.com/monarch-initiative/dipper/blob/master/dipper/sources/OMIMSource.py

matentzn commented 3 years ago
mkdir -p mirror tmp
sh run.sh make build-omim

mondo process pulls ttl from ci.monarchinit.org

matentzn commented 3 years ago

https://data.monarchinitiative.org/experimental/omim.ttl

jiaola commented 3 years ago

Based on this: https://docs.google.com/document/d/1pKyAZsT1ZlZxxgkNBRucvQPwsxzDF0nhGC2VaiPLj_U/edit#

OMIM Import: OMIM class IDs OMIM Phenotypic Series SubclassOf where it exists SubclassOf Mendelian disease when another subclassOf does not exist Synonyms, labels + tidying Xrefs Obsoletion - need rules where see string MOVED TO, this class should be imported as obsolete [term name] with replaced by annotation

joeflack4 commented 2 years ago

@matentzn Not sure if/where to start here. Do you think this task is already largely completed? I know there are further issues related to the OMIM ingest, but I'm wondering if there's anything left in this issue that still needs to be completed. If there's anything left that isn't covered in another issue, could you enumerate what's left to be done here? Otherwise, maybe this issue is ready to be closed in favor of other OMIM related issues.

matentzn commented 2 years ago

This is completed thanks to you :)