Open sabrinatoro opened 1 year ago
another example:
mondo_id | mondo_label | xref | xref_source | original_label |
---|---|---|---|---|
MONDO:0850061 | nipah virus encephalitis | DOID:0050192 | MONDO:equivalentTo | Nipah virus encephalitis |
This term is probably the same as 'Nipah virus disease' (MONDO:0020499)
For this one, the exact synonym is: 'Nipah encephalitis' We decided to only match if all the words were in the label/synonym. However, if we could have a simple report with these "not exactly exact match for the potential new terms in the slurp file" that we can review, it would help
intracranial berry aneurysm 12 aneurysm, intracranial berry, type 12
This should be doable.
@hrshdhgd what do you think? Any other way we can go about this?
If there is any flaw in my thinking above, there is an easy way to solve this ticket:
During preprocessing, declare OMIM xrefs in DOID as "exactMatch". That way the current matching infrastructure will work without any changes, as we already "match on skos:exactMatch".
@hrshdhgd Example to start with:
http://purl.obolibrary.org/obo/MONDO_0007111 <skos:exactMatch rdf:resource="https://omim.org/entry/105800"/>
http://purl.obolibrary.org/obo/DOID_0080964 <skos:exactMatch rdf:resource="https://omim.org/entry/105800"/>
Why is the lexmatch pipeline not finding that MONDO_0007111 -exactMatch-> DOID_0080964?
DOID:0080964 intracranial berry aneurysm 1 skos:closeMatch MONDO:0007111 aneurysm, intracranial berry type 1 semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:105800
This is because lexmatch
tags it as a skos:closeMatch
in the mondo-sources-all-lexical.sssom.tsv
file (not version controlled since it is too large).
Full mention
subject_id subject_label predicate_id object_id object_label mapping_justification mapping_tool confidence subject_match_field object_match_field match_string subject_preprocessing object_preprocessing
DOID:0080964 intracranial berry aneurysm 1 skos:closeMatch MONDO:0007111 aneurysm, intracranial berry type 1 semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:105800
DOID:0080964 intracranial berry aneurysm 1 skos:closeMatch Orphanet:231160 Familial cerebral saccular aneurysm semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:105800
DOID:0080965 intracranial berry aneurysm 2 skos:closeMatch MONDO:0012053 aneurysm, intracranial berry, 2 semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:608542
DOID:0080965 intracranial berry aneurysm 2 skos:closeMatch Orphanet:231160 Familial cerebral saccular aneurysm semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:608542
DOID:0080966 intracranial berry aneurysm 3 skos:closeMatch Orphanet:231160 Familial cerebral saccular aneurysm semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:609122
DOID:0080966 intracranial berry aneurysm 3 skos:closeMatch MONDO:0012194 aneurysm, intracranial berry, 3 semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:609122
DOID:0080967 intracranial berry aneurysm 4 skos:closeMatch Orphanet:231160 Familial cerebral saccular aneurysm semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:610213
DOID:0080967 intracranial berry aneurysm 4 skos:closeMatch MONDO:0012443 aneurysm, intracranial berry, 4 semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:610213
DOID:0080968 intracranial berry aneurysm 5 skos:closeMatch Orphanet:231160 Familial cerebral saccular aneurysm semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:300870
DOID:0080968 intracranial berry aneurysm 5 skos:closeMatch MONDO:0010468 aneurysm, intracranial berry, 5 semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:300870
DOID:0080969 intracranial berry aneurysm 6 skos:closeMatch MONDO:0012752 aneurysm, intracranial berry, 6 semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:611892
DOID:0080969 intracranial berry aneurysm 6 skos:closeMatch Orphanet:231160 Familial cerebral saccular aneurysm semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:611892
DOID:0080970 intracranial berry aneurysm 7 skos:closeMatch Orphanet:231160 Familial cerebral saccular aneurysm semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:612161
DOID:0080970 intracranial berry aneurysm 7 skos:closeMatch MONDO:0012810 aneurysm, intracranial berry, 7 semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:612161
DOID:0080971 intracranial berry aneurysm 8 skos:closeMatch MONDO:0012811 aneurysm, intracranial berry, 8 semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:612162
DOID:0080971 intracranial berry aneurysm 8 skos:closeMatch Orphanet:231160 Familial cerebral saccular aneurysm semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:612162
DOID:0080972 intracranial berry aneurysm 9 skos:closeMatch Orphanet:231160 Familial cerebral saccular aneurysm semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:612586
DOID:0080972 intracranial berry aneurysm 9 skos:closeMatch MONDO:0012949 aneurysm, intracranial berry, 9 semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:612586
DOID:0080973 intracranial berry aneurysm 10 skos:closeMatch MONDO:0012950 aneurysm, intracranial berry, 10 semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:612587
DOID:0080973 intracranial berry aneurysm 10 skos:closeMatch Orphanet:231160 Familial cerebral saccular aneurysm semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:612587
DOID:0080974 intracranial berry aneurysm 11 skos:closeMatch MONDO:0013654 aneurysm, intracranial berry, 11 semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:614252
DOID:0080974 intracranial berry aneurysm 11 skos:closeMatch Orphanet:231160 Familial cerebral saccular aneurysm semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:614252
DOID:0080975 intracranial berry aneurysm 12 skos:closeMatch Orphanet:231160 Familial cerebral saccular aneurysm semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:618734
DOID:0080975 intracranial berry aneurysm 12 skos:closeMatch MONDO:0032891 aneurysm, intracranial berry, 12 semapv:LexicalMatching oaklib 0.5 oio:hasDbXref oio:hasDbXref omim:618734
But the config says "exact": https://github.com/monarch-initiative/mondo-ingest/blob/main/src/ontology/config/mondo-match-rules.yaml#L42
Can you change this so that match on exact match is "exact"?
In tmp/mondo.sssom.tsv
subject_id subject_label predicate_id object_id object_label mapping_justification
MONDO:0016483 intracranial berry aneurysm skos:exactMatch OMIMPS:105800 semapv:UnspecifiedMatching
MONDO:0007111 aneurysm, intracranial berry type 1 skos:exactMatch OMIM:105800 semapv:UnspecifiedMatching
The prefixes are different. Is that expected?
But the config says "exact": https://github.com/monarch-initiative/mondo-ingest/blob/main/src/ontology/config/mondo-match-rules.yaml#L42
That's because - oio:hasDbXref
was commented out based on some discussion in the past. I'll uncomment it and run again to see what happens.
No, they are both skos exaxt match. Don't use hasDbXref! Something is going wrong if not both sources have skos exact match!
Digging deeper...
These are parts of the merged.db.lexical.yaml
which is the lexical_index
of merged.db
. I did a search for 105800
and these were what I found.
omimps:105800:
term: omimps:105800
relationships:
- predicate: oio:hasDbXref
element: DOID:0060228
element_term: OMIMPS:105800
pipeline:
- default
synonymized: false
- predicate: oio:hasDbXref
element: MONDO:0016483
element_term: OMIMPS:105800
pipeline:
- default
synonymized: false
omim:105800:
term: omim:105800
relationships:
- predicate: oio:hasDbXref
element: DOID:0080964
element_term: OMIM:105800
pipeline:
- default
synonymized: false
- predicate: oio:hasDbXref
element: MONDO:0007111
element_term: OMIM:105800
pipeline:
- default
synonymized: false
- predicate: oio:hasDbXref
element: Orphanet:231160
element_term: OMIM:105800
pipeline:
- default
synonymized: false
aneurysm, intracranial berry, 1:
term: aneurysm, intracranial berry, 1
relationships:
- predicate: oio:hasRelatedSynonym
element: MONDO:0007111
element_term: aneurysm, intracranial berry, 1
pipeline:
- default
synonymized: false
- predicate: rdfs:label
element: OMIM:105800
element_term: aneurysm, intracranial berry, 1
pipeline:
- default
synonymized: false
- predicate: oio:hasExactSynonym
element: OMIM:105800
element_term: aneurysm, intracranial berry, 1
pipeline:
- default
synonymized: false
aneurysmal subarachnoid hemorrhage, familial:
term: aneurysmal subarachnoid hemorrhage, familial
relationships:
- predicate: oio:hasRelatedSynonym
element: MONDO:0007111
element_term: aneurysmal subarachnoid hemorrhage, familial
pipeline:
- default
synonymized: false
- predicate: oio:hasExactSynonym
element: OMIM:105800
element_term: aneurysmal subarachnoid hemorrhage, familial
pipeline:
- default
synonymized: false
anib1:
term: anib1
relationships:
- predicate: oio:hasExactSynonym
element: OMIM:105800
element_term: ANIB1
pipeline:
- default
synonymized: false
And a quick search of DOID:0080964
revealed:
intracranial berry aneurysm 1:
term: intracranial berry aneurysm 1
relationships:
- predicate: rdfs:label
element: DOID:0080964
element_term: intracranial berry aneurysm 1
pipeline:
- default
synonymized: false
omim:105800:
term: omim:105800
relationships:
- predicate: oio:hasDbXref
element: DOID:0080964
element_term: OMIM:105800
pipeline:
- default
synonymized: false
- predicate: oio:hasDbXref
element: MONDO:0007111
element_term: OMIM:105800
pipeline:
- default
synonymized: false
- predicate: oio:hasDbXref
element: Orphanet:231160
element_term: OMIM:105800
pipeline:
- default
synonymized: false
Am I missing something? I do not see an exactMatch
predicate for the rules to kick in. I only see oio:hasDbXref
which is commented in the rules file.
Well they should be there! The exact matches.. check the raw owl files going in before db state and see if they are in there.
merged.owl
<!-- http://purl.obolibrary.org/obo/MONDO_0007111 -->
<owl:Class rdf:about="http://purl.obolibrary.org/obo/MONDO_0007111">
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/MONDO_0016483"/>
<oboInOwl:hasDbXref>MESH:C566284</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref>OMIM:105800</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref>UMLS:C1862932</oboInOwl:hasDbXref>
<oboInOwl:hasRelatedSynonym>aneurysm, intracranial berry, 1</oboInOwl:hasRelatedSynonym>
<oboInOwl:hasRelatedSynonym>aneurysmal subarachnoid hemorrhage, familial</oboInOwl:hasRelatedSynonym>
<oboInOwl:id>MONDO:0007111</oboInOwl:id>
<rdfs:label>aneurysm, intracranial berry type 1</rdfs:label>
<skos:exactMatch rdf:resource="http://identifiers.org/mesh/C566284"/>
<skos:exactMatch rdf:resource="http://linkedlifedata.com/resource/umls/id/C1862932"/>
<skos:exactMatch rdf:resource="https://omim.org/entry/105800"/>
</owl:Class>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/MONDO_0007111"/>
<owl:annotatedProperty rdf:resource="http://www.w3.org/2000/01/rdf-schema#subClassOf"/>
<owl:annotatedTarget rdf:resource="http://purl.obolibrary.org/obo/MONDO_0016483"/>
<oboInOwl:source>DC-OMIM:105800</oboInOwl:source>
<oboInOwl:source>OMIM:105800</oboInOwl:source>
</owl:Axiom>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/MONDO_0007111"/>
<owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
<owl:annotatedTarget>MESH:C566284</owl:annotatedTarget>
<oboInOwl:source>MONDO:equivalentTo</oboInOwl:source>
</owl:Axiom>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/MONDO_0007111"/>
<owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
<owl:annotatedTarget>OMIM:105800</owl:annotatedTarget>
<oboInOwl:source>MONDO:equivalentTo</oboInOwl:source>
</owl:Axiom>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/MONDO_0007111"/>
<owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
<owl:annotatedTarget>UMLS:C1862932</owl:annotatedTarget>
<oboInOwl:source>MONDO:equivalentTo</oboInOwl:source>
<oboInOwl:source>MONDO:ncbi_mim2gene_medline</oboInOwl:source>
<oboInOwl:source>OMIM:105800</oboInOwl:source>
</owl:Axiom>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/MONDO_0007111"/>
<owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym"/>
<owl:annotatedTarget>aneurysm, intracranial berry, 1</owl:annotatedTarget>
<oboInOwl:hasDbXref>MONDO:Lexical</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref>OMIM:105800</oboInOwl:hasDbXref>
</owl:Axiom>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/MONDO_0007111"/>
<owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym"/>
<owl:annotatedTarget>aneurysmal subarachnoid hemorrhage, familial</owl:annotatedTarget>
<oboInOwl:hasDbXref>OMIM:105800</oboInOwl:hasDbXref>
</owl:Axiom>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/MONDO_0007111"/>
<owl:annotatedProperty rdf:resource="http://www.w3.org/2004/02/skos/core#exactMatch"/>
<owl:annotatedTarget rdf:resource="http://identifiers.org/mesh/C566284"/>
<sssom:mapping_justification rdf:resource="https://w3id.org/semapv/UnspecifiedMatching"/>
<sssom:subject_label>aneurysm, intracranial berry type 1</sssom:subject_label>
</owl:Axiom>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/MONDO_0007111"/>
<owl:annotatedProperty rdf:resource="http://www.w3.org/2004/02/skos/core#exactMatch"/>
<owl:annotatedTarget rdf:resource="http://linkedlifedata.com/resource/umls/id/C1862932"/>
<sssom:mapping_justification rdf:resource="https://w3id.org/semapv/UnspecifiedMatching"/>
<sssom:subject_label>aneurysm, intracranial berry type 1</sssom:subject_label>
</owl:Axiom>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/MONDO_0007111"/>
<owl:annotatedProperty rdf:resource="http://www.w3.org/2004/02/skos/core#exactMatch"/>
<owl:annotatedTarget rdf:resource="https://omim.org/entry/105800"/>
<sssom:mapping_justification rdf:resource="https://w3id.org/semapv/UnspecifiedMatching"/>
<sssom:object_label>aneurysm, intracranial berry, 1</sssom:object_label>
<sssom:subject_label>aneurysm, intracranial berry type 1</sssom:subject_label>
</owl:Axiom>
<!-- http://purl.obolibrary.org/obo/MONDO_0016483 -->
<owl:Class rdf:about="http://purl.obolibrary.org/obo/MONDO_0016483">
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/MONDO_0003847"/>
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/MONDO_0005291"/>
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/MONDO_0015145"/>
<obo:IAO_0000115>An intracranial aneurysm with a characteristic rounded shape; the most common form of cerebral aneurysm.</obo:IAO_0000115>
<mondo:excluded_from_qc_check rdf:resource="http://purl.obolibrary.org/obo/mondo/sparql/qc/general/qc-single-child.sparql"/>
<mondo:should_conform_to rdf:resource="http://purl.obolibrary.org/obo/mondo/patterns/OMIM_phenotypic_series.yaml"/>
<oboInOwl:hasDbXref>DOID:0060228</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref>HP:0007029</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref>OMIMPS:105800</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref>Orphanet:231160</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref>SCTID:703226008</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref>UMLS:CN230268</oboInOwl:hasDbXref>
<oboInOwl:hasExactSynonym>aneurysm, intracranial berry</oboInOwl:hasExactSynonym>
<oboInOwl:hasExactSynonym>familial aneurysmal subarachnoid hemorrhage</oboInOwl:hasExactSynonym>
<oboInOwl:hasExactSynonym>familial berry aneurysm</oboInOwl:hasExactSynonym>
<oboInOwl:hasExactSynonym>familial intracranial saccular aneurysm</oboInOwl:hasExactSynonym>
<oboInOwl:hasExactSynonym>saccular cerebral aneurysm</oboInOwl:hasExactSynonym>
<oboInOwl:hasRelatedSynonym>familial cerebral saccular aneurysm</oboInOwl:hasRelatedSynonym>
<oboInOwl:id>MONDO:0016483</oboInOwl:id>
<oboInOwl:inSubset rdf:resource="http://purl.obolibrary.org/obo/mondo#ordo_disease"/>
<rdfs:label>intracranial berry aneurysm</rdfs:label>
<skos:exactMatch rdf:resource="http://identifiers.org/snomedct/703226008"/>
<skos:exactMatch rdf:resource="http://linkedlifedata.com/resource/umls/id/CN230268"/>
<skos:exactMatch rdf:resource="http://purl.obolibrary.org/obo/DOID_0060228"/>
<skos:exactMatch rdf:resource="http://www.orpha.net/ORDO/Orphanet_231160"/>
<skos:exactMatch rdf:resource="https://omim.org/phenotypicSeries/PS105800"/>
</owl:Class>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/MONDO_0016483"/>
<owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
<owl:annotatedTarget>OMIMPS:105800</owl:annotatedTarget>
<oboInOwl:source>MONDO:cjm</oboInOwl:source>
<oboInOwl:source>MONDO:equivalentTo</oboInOwl:source>
</owl:Axiom>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/MONDO_0016483"/>
<owl:annotatedProperty rdf:resource="http://www.w3.org/2004/02/skos/core#exactMatch"/>
<owl:annotatedTarget rdf:resource="https://omim.org/phenotypicSeries/PS105800"/>
<sssom:mapping_justification rdf:resource="https://w3id.org/semapv/UnspecifiedMatching"/>
<sssom:subject_label>intracranial berry aneurysm</sssom:subject_label>
</owl:Axiom>
DOID_0080964
<!-- http://purl.obolibrary.org/obo/DOID_0080964 -->
<owl:Class rdf:about="http://purl.obolibrary.org/obo/DOID_0080964">
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/DOID_0050736"/>
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/DOID_0060228"/>
<obo:IAO_0000115>An intracranial berry aneurysm that is characterized by rupture of an intracranial aneurysm, an outpouching or sac-like widening of a cerebral artery, leads to a subarachnoid hemorrhage, a sudden-onset disease that can lead to severe disability and death and has been mapped to chromosome 7q11.2.</obo:IAO_0000115>
<oboInOwl:hasDbXref>OMIM:105800</oboInOwl:hasDbXref>
<rdfs:label>intracranial berry aneurysm 1</rdfs:label>
<skos:exactMatch rdf:resource="https://omim.org/entry/105800"/>
</owl:Class>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/DOID_0080964"/>
<owl:annotatedProperty rdf:resource="http://purl.obolibrary.org/obo/IAO_0000115"/>
<owl:annotatedTarget>An intracranial berry aneurysm that is characterized by rupture of an intracranial aneurysm, an outpouching or sac-like widening of a cerebral artery, leads to a subarachnoid hemorrhage, a sudden-onset disease that can lead to severe disability and death and has been mapped to chromosome 7q11.2.</owl:annotatedTarget>
<oboInOwl:hasDbXref>url:https://pubmed.ncbi.nlm.nih.gov/16736093/</oboInOwl:hasDbXref>
</owl:Axiom>
Here's what the lexical index file shows:
omim:105800:
term: omim:105800
relationships:
- predicate: oio:hasDbXref
element: DOID:0080964
element_term: OMIM:105800
pipeline:
- default
synonymized: false
- predicate: oio:hasDbXref
element: MONDO:0007111
element_term: OMIM:105800
pipeline:
- default
synonymized: false
- predicate: oio:hasDbXref
element: Orphanet:231160
element_term: OMIM:105800
pipeline:
- default
synonymized: false
So if hasDbXref
is uncommented in the match_rules.yaml
, the skos:exactMatch
should automatically appear (theoretically speaking).
This still seems strange - we do not want to match on hasDbXref, just skos exactMatch. To me it seems that oak Lexmatch does not know how to take skos exact match into account when building the lexical index..
I am looking at the slurp file : https://github.com/monarch-initiative/mondo-ingest/blob/main/src/ontology/slurp/doid.tsv
Many of the terms in this list (suggested to be new terms in Mondo) already exist in Mondo. For example:
These are probably the same as:
Could we set up additional mapping rules or reports that could help or support curation? (so I don't have to manually check every 12 types of intracranial berry aneurysm to ensure they are the same in DO and Mondo; and the same with other terms)
Some ideas would be:
Note: if this work would take longer than me doing it manually, I can do it manually. But it might help us in the long term.