monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

DO mappings: unable to import (`OMIM`(`PS`) --> `MIM` #560

Closed twhetzel closed 2 weeks ago

twhetzel commented 3 weeks ago

Overview

We now have less DO mappings (+2177 -13,705) as of the last build (#556).

Probable cause of issue

Following Nico's suspicion and Trish's initial investigation, I looked at the components/doid.owl for this build, and compared it to an older one. Indeed, they did do these changes.

<!-- Previously: OMIM -->
<owl:Class rdf:about="http://purl.obolibrary.org/obo/DOID_0050331">
    <oboInOwl:hasDbXref>MIM:149730</oboInOwl:hasDbXref>

<!-- Previously: OMIMPS -->
<owl:Class rdf:about="http://purl.obolibrary.org/obo/DOID_0050328">
    <oboInOwl:hasDbXref>MIM:PS275200</oboInOwl:hasDbXref>

The also previously had some references directly to OMIM URIs, e.g. https://omim.org/entry/..., but those are removed. I only see CURIE references now.

Possible solutions

  1. Replace our prefix_maps to account for MIM (https://github.com/monarch-initiative/omim/issues/111)
  2. Any other solution?

Additional info

Context and more information:

before and after - doid.sssom.tsv.zip

DO had a new release on 5/29/2024. That release seems to be when they switched prefix maps (e.g. --> MIM).

joeflack4 commented 3 weeks ago

@matentzn @twhetzel Any other ideas for how to fix this? What is the priority for this issue?

twhetzel commented 3 weeks ago

Priority - I would think this would be extremely high. The Build: 2024-06-05 works for Nico and what is needed for subsets, but I don't think this build will work for Sabrina and as I understand she'll be looking for those Week 3 of the development cycle so week of June 17.

twhetzel commented 3 weeks ago

As far as a solution, I don't see the connection to why lexical mappings between Mondo and DO were dropped because the DO term now has xrefs where the prefix changed from OMIM(PS) to MIM. However, I'm not sure that all of the external content that mondo-ingest uses has made that change so I'm not sure if both the old (OMIM(PS)) and new (MIM) prefixes need to be accounted for.

twhetzel commented 3 weeks ago

Adding in a comment from Nico, although I don't see the connection to how this would cause the issue here: xrefs are used as candidate evidence for lexmatch. Weak evidence, but evidence non the less: if two classes share and xref they are considered candidate matches.

joeflack4 commented 3 weeks ago

RE: @matentzn's comment above

although I don't see the connection to how this would cause the issue here

@matentzn Let me know if you suspect any other reason, or if I should do some sort of analysis / investigation into other causes.

Even though we don't see how it is connected, it seems a likely cause, giving Trish and my preliminary investigation (thread)? Also, there are a net ~11,500 less rows in the file, and when I search for "OMIM" in the file, it appears 11,358 times.

FYI I also just attached copies of the file before and after the last build (before/after the big change) to the OP.

twhetzel commented 3 weeks ago

FYI - I received another DM from Nico over the weekend thinking the issue is with the doid preprocessing sparqls. How are the Mondo->DO mappings with OMIM as the source created?

--> @twhetzel to review the steps in https://github.com/monarch-initiative/mondo-ingest/blob/main/src/ontology/mondo-ingest.Makefile#L104

joeflack4 commented 3 weeks ago

Upon further investigation looking at the components/doid.owl goal, we think that perhaps fix_omimps.ru and/or fix_make_omim_exact.ru may be the cause or part of the cause of this issue. Namely in the way that the prefix maps for these files are also now out of sync with DO's, as of their latest release.

joeflack4 commented 2 weeks ago

@twhetzel IDK why this didn't automatically close even though it was linked. This has been happening sometimes.

This can be closed now, right?