monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
5 stars 3 forks source link

UMLS ingest #114

Open matentzn opened 1 year ago

matentzn commented 1 year ago

Noting this https://documentation.uts.nlm.nih.gov/automating-downloads.html

an automated way to obtain UMLS mappings through the API.

matentzn commented 1 year ago

Another alternative: https://github.com/pyobo/pyobo/blob/main/src/pyobo/sources/umls/umls.py

@hrshdhgd can you try this between breaths at some point? This is not super urgent, but I keep putting it off and we will need it quite suddenly in late September, where I would like to avoid big surprises..

For now, we need one thing more urgently then the other:

  1. The UMLS mappings in SSSOM format. You can follow @joeflack4 approach to create a new repo like https://github.com/monarch-initiative/gard to add the required code for this.

This should not be a major endeavour, but if we can make some progress towards it slowly slowly, that would be great.

cmungall commented 1 year ago

I have perl to turn the TSVs from medgen/NCBI disease subset into serviceable obo format. It's not pretty but AFAICR it works. It's in the old ingest repo

https://github.com/cmungall/diseases2owl/tree/master/sources/medgen

This is essentially the disease subset of UMLS plus medgen pseudoCUIs

joeflack4 commented 1 year ago

Yes, I agree with @cmungall . I've been using his work here: https://github.com/monarch-initiative/medgen

Some of it has been tweaked but largely it works great as is.

UMLS terms come from that in addition to Medgen terms. I'm not 100% certain, but I think it will include 100% of them.

@matentzn If you like we can rename it the umls-medgen ingest.

matentzn commented 1 year ago

I did not realise this at all. In any case, we need a separate UMLS ingest, because I also need to extract mappings to and from HPO and another ontologies that may not fall under "disease" - but I can reduce the scope to extracting only the mappings if @cmungall you agree we should be using the medgen ingest to align with UMLS, rather than constructing a separate UMLS ingest.

@hrshdhgd this is only partially related to your efforts, because what we need right now is some of the UMLS mappings in SSSOM format for our mapping efforts.

hrshdhgd commented 1 year ago

https://github.com/monarch-initiative/umls-ingest ... work in progress.

joeflack4 commented 1 year ago

@hrshdhgd Since there's some overlap in what we're doing I'm guess we should browse through each other's code; though for medgen ingest, most of it is Chris's.