obophenotype / human-phenotype-ontology

Ontology for the description of human clinical features
http://obophenotype.github.io/human-phenotype-ontology/
Other
293 stars 51 forks source link

Obsolete MeSH supplementary concepts as xrefs #9750

Closed bgyori closed 4 months ago

bgyori commented 1 year ago

I found that a number of HP terms refer to obsolete MeSH supplementary concepts as xrefs. Here is the complete list:

HPO ID HPO name MeSH ID
HP:0000835 Adrenal hypoplasia C538429
HP:0001647 Bicuspid aortic valve C562388
HP:0001838 Rocker bottom foot C536345
HP:0002895 Papillary thyroid carcinoma C536915
HP:0004813 Post-transfusion thrombocytopenia C562868
HP:0006897 Abducens palsy C564661
HP:0010445 Primum atrial septal defect C548006
HP:0011540 Congenitally corrected transposition of the great arteries C535426
HP:0011675 Arrhythmia C562490
HP:0011743 Adrenal gland agenesis C538429
HP:0012108 Open angle glaucoma C562750
HP:0030078 Lung adenocarcinoma C538231
HP:0040198 Non-medullary thyroid carcinoma C536915
HP:0100001 Malignant mesothelioma C562839

Is there a preferred way to deal with these?

When searching for these terms in MeSH, there are usually close matches but it's not always trivially an exact match. As an example for "Papillary thyroid carcinoma", there is a MeSH term "Thyroid Cancer, Papillary" (https://meshb.nlm.nih.gov/record/ui?ui=D000077273), which sounds like a broader term but the MeSH definition "An ADENOCARCINOMA that originates from follicular cells of the THYROID GLAND..." would suggest the two are actually equivalent.

pnrobinson commented 1 year ago

We are recommending to use the official UMLS mapping (https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/HPO/index.html) @drseb @mellybelly @matentzn We should probably remove xrefs from UMLS terminologies from the HPO file entirely and instead refer to the UMLS. First, it would be good to write a detailed tutorial of how to get the UMLS files, which is fairly complicated and will scare off many users.

bgyori commented 1 year ago

I can see how referring to UMLS reduces redundancy and possible inconsistencies but I think that from a user perspective it creates complications since it is significantly more difficult to work with. So for what it's worth, I would much prefer having some xrefs (I am particularly interested in MeSH) directly available in HPO.

cthoyt commented 1 year ago

I agree that working with UMLS is not so convenient and it's nice to maintain the mappings inside HPO (from a user's perspective)

But also it seems like this issue is about MeSH, not UMLS

cthoyt commented 1 year ago

Update: we did a bit of an analysis on UMLS, MeSH, and HPO to see what value each adds. It turns out that there are non-redundant mappings from both HPO and UMLS that are valuable, and therefore it would be problematic to remove all of them wholesale

https://github.com/biopragmatics/semra/blob/main/notebooks/umls-inference-analysis.ipynb

pnrobinson commented 1 year ago

I think the problem is that we do not have resources to support these mappings and so the XREFs in the hp.owl file are all about ten years old, and are by no means comprehensive. The UMLS team is doing this regularly and so they have by far the highest quality mappings. We are also working on a new SNOMED mapping that will live outside the hp file. We should delete the UMLS and SNOMED refs so that there is one source of truth. The difficult thing is actually extracting the UMLS data and it would be great for us to write a tutorial (I do not know how to do it myself haha)

matentzn commented 1 year ago

I know what to do here, but I will wait for the SNOMED mappings to trickle in, and then I will deal with everything at once.

I guess this is the key thing wrg to the UMLS overlap:

image

The UMLS mappings sound very few.. I am surprised there is so much difference between UMLS and HPO wrt Mesh.

pnrobinson commented 1 year ago

@matentzn whatever the overlap, we are simply not supporting this anymore == we should remove and refer people to UMLS and make a tutorial.

I think that it is better to put xrefs into separate files with SSOM, the OWL edit is not a good place to keep this information.

matentzn commented 1 year ago

Yes, I agree @pnrobinson - I will make a coherent proposal that makes everyone happy when it comes!

pnrobinson commented 9 months ago

@matentzn Would it be possible to remove all of the HPO MeSH ids, move them to an external file, and then compare to UMLS and coordinate with @kanems ? @kanems would that be useful from an UMLS perspective? A long time age we did a manual mapping to MeSH. It is really hard for most of our users to use the UMLS mapping tools, and so maybe we can find a way of regularly extracting mappings, or is that a licence issue?

kanems commented 9 months ago

If users of HPO want to know what MeSH IDs are equivalent to an HPO phenotype, that is within the scope of MedGen's subset of UMLS processing. Both are at the 'level 0' set of licensing rules, so I don't see any reason why HPO couldn't use our processed UMLS subset to refresh the HPO-MeSH mappings. The MedGen data processing pulls in the data from UMLS following their 2x/year releases and uses that as our truth table for HPO-CUI relationships. We then generate reports in our FTP space that map CUIs in UMLS to other vocabulary IDs, this includes MeSH, HPO, OMIM, Mondo, OrphaNet and GARD IDs. This is the MedGenIDMappings.txt.gz file on FTP. Not all HPO IDs will get a MeSH mapping, though. 1- If the HPO term is created new in between UMLS releases, we assign a temporary MedGen CUI (CN########) and then look for the CUI replacement in the next release. 2- Sometimes >1 HPO ID is mapped to a CUI (hence my persistence about those potentially redundant HPO records, I wanted to only push for UMLS to review their mappings for pairs where HPO was certain the terms were unique concepts), but MedGen will respect HPO's structure and keep those on different records in MedGen (so one gets a CUI and possibly MeSH equivalent, the other gets a CN CUI until UMLS changes something). This would mean that while UMLS may say 1 MeSH ID is equivalent to 2 HPO IDs, we would only report 1 as being equivalent. 3- Not all CUIs get a MeSH ID, so some HPO IDs are not going to match a MeSH concept.

Mondo team is already in process on using the IDMappings file to update Mondo-CUI mappings based on MedGen's processing and curation, perhaps that could be reconfigured/reworked to pull the HPO-CUI-MeSH mappings?

There is a whole different level of mapping, though, if the concern is what HPO terms describe disease entities in MeSH. We bring in and report the HPO-OMIM disease mappings, and thus if a MeSH ID is equivalent to a MIM number, that could be extracted from comparing a couple of MedGen reports but... unless that's the specific request here, I don't want to get into that much more complicated approach.

matentzn commented 9 months ago

@kanems That is fantastic - I will just extend our pipeline than to support this! Thank you so much!

There is a whole different level of mapping, though, if the concern is what HPO terms describe disease entities in MeSH.

Yes, for sure. This is not what we are discussing there, and for this we would be using our HPOA files HPO->OMIM/ORDO->MONDO etc. This is not what this issue is all about.

Alright I will deal with this then! Thanks so much!

matentzn commented 8 months ago

To track progress:

Dependency:

pnrobinson commented 6 months ago

@matentzn @bgyori -- It looks as if the above issue has been closed. Can we also close this issue, and if not, what is still left to do?

matentzn commented 6 months ago

How do you want those mappings to be redistributed? Shall we just add a file to the HPO repo? Update the Mesh xrefs in HPO? Both?

pnrobinson commented 6 months ago

Ideally we would figure out how to create mappings from the UMLS resource and publish instructions on the website. I do not think there is a need for us to create an extra downloadable file.

matentzn commented 6 months ago

That was the point of this ticket - we already have that file now! We get it through @kanems! The question is only now how we inform people about it..

There is no easy way to "tell people to get the information from UMLS" - it is always a bit painful..

bgyori commented 6 months ago

@matentzn where is that file?

matentzn commented 6 months ago

Here: https://github.com/monarch-initiative/medgen/releases/tag/2024-05-05

pnrobinson commented 4 months ago

@matentzn the mapping is great. Can we add documentation to the HPO website about it? Then I guess we should also remove the legacy mappings from the hp-edit.owl file also?

matentzn commented 4 months ago

Documentation: https://obophenotype.github.io/human-phenotype-ontology/developers/mappings/ Removed MSH xrefs: https://github.com/obophenotype/human-phenotype-ontology/pull/10605

@pnrobinson feel free to merge the above and close.

pnrobinson commented 4 months ago

Merged, thanks!