Open joeflack4 opened 7 months ago
@matentzn Mind if I have @twhetzel review my proposed solution and make the call on what the priority should be to fix this?
@twhetzel Can you review the problem and my proposed solution and let me update the urgency/priority on the boards if necessary? I've set the priority to 'low', because I don't think we use / intend to use these mapping annotations, though I'm not sure.
The TLDR on this is that we have a SPARQL query that takes the source Orphanet OWL and adds some skos
match annotations to components/ordo.owl
, but the query is currently broken. I've proposed a fix, but I also don't know if we are currently using / intend to use these annotations. This is probably a <30min fix for me but I'm not sure; could take longer if my solution results in false positives (probably not).
@twhetzel FYI, I looked at this today and here are my thoughts:
I started up a PR #499 to address this because I thought it might not be quick, but I no longer think so.
The crux of this fix I think is that the SPARQL can be vastly simplified. It had a lot of duplicate matching strings. I think that for each type of token, e.g. BTNT
, each matching string will either start with a hyphen or without, e.g. BTNT (
and - BTNT (
. The solution to the refactor is:
a. Continue using STRSTARTS()
, and have a clause for each of these 2 variations.
b. Use CONTAINS()
instead
I wonder if this can also be done easier with OAK than a sparql update, but we already have a mostly working SPARQL file. Just needs minor tweaking.
The part of this which takes the most time is that a lot of obsolete code can now get deleted. This SPARQL query ordo-replace-annotation-based-mappings.ru
, along with another violation QC check SPARQL file, were dynamically created via several Python files and make goals. I think all of that is obsolete. I think this should just be refactored to 2 static SPARQL files.
If when I start work on this again, I can continue from this PR:
Overview
While working on https://github.com/monarch-initiative/icd11/pull/12, Nico suggested to write a SPARQL query that reads ICD11 xrefs and asserts SKOS exact matches. There actually already was such a file that is supposed to do this,
ordo-replace-annotation-based-mappings.ru
. However, there are noskos
matches annotations incomponents/ordo.owl
, so this is broken.If when I start work on this again, I can continue from this PR:
499
Possible causes
a.
obo:ECO_0000218
removed (probably not the cause)The SPARQL file depends on this. Example:
<obo:ECO_0000218 xml:lang="en">E (Exact mapping: the two concepts are equivalent)</obo:ECO_0000218>
. It's not inconfig/properties.txt
, so it gets removed. However, this happens after the SPARQL update happens, so I don't think this is the issue.b. Mapping string has changed
Looks like this is the cause.
What
ordo-replace-annotation-based-mappings.ru
expects:What the Orphanet OWL actually has as of 2024/03:
Possible solutions: I suggest that we change this:
to this:
And we should do the sam ething with the other codes. I think they are:
- BTNT (
,- NTBT (
,- ND (