monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

Update ICD11 relevant SPARQL query to exclude classes in the Extension Code branch #547

Closed twhetzel closed 6 months ago

twhetzel commented 6 months ago

Update src/sparql/icd11foundation-relevant-signature.sparql to exclude any class that is a child of the Extension Code branch.

twhetzel commented 6 months ago

Joe - here is the SPARQL query. I'll leave the code update/branch management, etc. for you.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?term
WHERE {
  { 
    { 
      ?s1 ?p1 ?term . 
      ?term rdfs:subClassOf* <http://id.who.int/icd/entity/455013390> .
    }
    UNION
    { 
      ?term ?p2 ?o2 . 
      ?term rdfs:subClassOf* <http://id.who.int/icd/entity/455013390> .
    }
  }
  FILTER(isIRI(?term))
  FILTER NOT EXISTS {
    ?term rdfs:subClassOf* <http://id.who.int/icd/entity/979408586> .
  }
}
twhetzel commented 6 months ago

Here is the hierarchy view of ICD11 Foundation from the WHO site to show why subclasses of Extension Codes are being excluded. The branch has terms that are not relevant, but would otherwise be included since Extension Codes also have ICD Category as a parent. In addition, there are terms with the same exact label, but different IRI as terms in the "ICD Category" branch so the "Extension Codes" terms would show up as exact lexical matches, which is a file that receives less curator review.

icd11foundation - extension codes

joeflack4 commented 6 months ago

Thanks for creating the query for me! The screenshot is also helpful. Going to add this and begin a new build to main.