monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

Bug fix: `components/ordo.owl` missing mapping annotations #485

Open joeflack4 opened 7 months ago

joeflack4 commented 7 months ago

Overview

While working on https://github.com/monarch-initiative/icd11/pull/12, Nico suggested to write a SPARQL query that reads ICD11 xrefs and asserts SKOS exact matches. There actually already was such a file that is supposed to do this, ordo-replace-annotation-based-mappings.ru. However, there are no skos matches annotations in components/ordo.owl, so this is broken.

If when I start work on this again, I can continue from this PR:

Possible causes

a. obo:ECO_0000218 removed (probably not the cause)

The SPARQL file depends on this. Example: <obo:ECO_0000218 xml:lang="en">E (Exact mapping: the two concepts are equivalent)</obo:ECO_0000218>. It's not in config/properties.txt, so it gets removed. However, this happens after the SPARQL update happens, so I don't think this is the issue.

b. Mapping string has changed

Looks like this is the cause.

What ordo-replace-annotation-based-mappings.ru expects:

  • E (Exact mapping: the two concepts are equivalent).\n- Specific code (The ORPHA code has its own code in the ICD10).

What the Orphanet OWL actually has as of 2024/03:

  • E (Exact mapping: the two concepts are equivalent).\nSpecific code (ICD-10/ICD-11: ORPHAcode has its own code in the targeted terminology).

Possible solutions: I suggest that we change this:

FILTER(STRSTARTS(STR(?mapping_precision_string), "- E (Exact mapping: the two concepts are equivalent).\n- Specific code (The ORPHA code has its own code in the ICD10)."))

to this:

FILTER(STRSTARTS(STR(?mapping_precision_string), "- E (

And we should do the sam ething with the other codes. I think they are: - BTNT (, - NTBT (, - ND (

joeflack4 commented 7 months ago

@matentzn Mind if I have @twhetzel review my proposed solution and make the call on what the priority should be to fix this?

@twhetzel Can you review the problem and my proposed solution and let me update the urgency/priority on the boards if necessary? I've set the priority to 'low', because I don't think we use / intend to use these mapping annotations, though I'm not sure. The TLDR on this is that we have a SPARQL query that takes the source Orphanet OWL and adds some skos match annotations to components/ordo.owl, but the query is currently broken. I've proposed a fix, but I also don't know if we are currently using / intend to use these annotations. This is probably a <30min fix for me but I'm not sure; could take longer if my solution results in false positives (probably not).

joeflack4 commented 6 months ago

@twhetzel FYI, I looked at this today and here are my thoughts:

I started up a PR #499 to address this because I thought it might not be quick, but I no longer think so.

The crux of this fix I think is that the SPARQL can be vastly simplified. It had a lot of duplicate matching strings. I think that for each type of token, e.g. BTNT, each matching string will either start with a hyphen or without, e.g. BTNT ( and - BTNT (. The solution to the refactor is: a. Continue using STRSTARTS(), and have a clause for each of these 2 variations. b. Use CONTAINS() instead

I wonder if this can also be done easier with OAK than a sparql update, but we already have a mostly working SPARQL file. Just needs minor tweaking.

The part of this which takes the most time is that a lot of obsolete code can now get deleted. This SPARQL query ordo-replace-annotation-based-mappings.ru, along with another violation QC check SPARQL file, were dynamically created via several Python files and make goals. I think all of that is obsolete. I think this should just be refactored to 2 static SPARQL files.

joeflack4 commented 5 months ago

If when I start work on this again, I can continue from this PR: