monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

Incorrect match in `unmapped_icd_lex_exact` #206

Open sabrinatoro opened 1 year ago

sabrinatoro commented 1 year ago

https://github.com/monarch-initiative/mondo-ingest/blob/main/src/ontology/lexmatch/unmapped_icd_lex_exact.tsv

subject_id object_id predicate_id object_label subject_label mapping_justification mapping_tool confidence subject_match_field object_match_field match_string comment                          
ID A oboInOwl:hasDbXref >A oboInOwl:source >A sssom:object_label                                          
MONDO:0004648 ICD10CM:F01.5 MONDO:equivalentTo Vascular dementia vascular dementia semapv:LexicalMatching oaklib 0.8497788952 rdfs:label rdfs:label vascular dementia LEXMATCH                          
MONDO:0004648 ICD10CM:F01 MONDO:equivalentTo Vascular dementia vascular dementia semapv:LexicalMatching oaklib 0.8497788952 rdfs:label rdfs:label vascular dementia LEXMATCH                          

2 different ICD codes are mapped to the same Mondo ID. ICD10CM:F01.5 label is actually "Vascular dementia, unspecified severity" Therefore, ICD10CM:F01.5 should probably not be a Mondo:equivalent. We should probably discuss how these "unspecified" should be treated.

sabrinatoro commented 1 year ago

Note that I didn't find other examples (looking roughly). I will however wait for this to be reviewed and a new exact mapping file to be created before merging, just in case.

matentzn commented 1 year ago

@hrshdhgd v. Important, we should have the actual label in the subject and object fields, not the trimmed ones.

hrshdhgd commented 1 year ago

I don't do anything to alter the labels. I use it as is. I'll double check this.

hrshdhgd commented 1 year ago

I'm not sure where this is coming from but if you look in this file in MONDO and find line 43087, it shows this mapping done. I think this mistake comes from the source.

<http://purl.obolibrary.org/obo/MONDO_0004648>  "A degenerative vascular disorder affecting the brain. It is caused by the blockage of the blood supply to the brain. It is manifested with decline of memory and cognitive functions." "NCIT:C34525 MESH:D015140 DOID:8725 ICD9:290.4 UMLS:C0011269 ICD10:F01.5 EFO:0004718 SCTID:429998004 ICD10:F01"

Also in mondo-edit.obo , line 75855 is xref: ICD10CM:F01.5 {source="DOID:8725"}

DOID:8725 label is vascular dementia.

matentzn commented 1 year ago

https://www.icd10data.com/ICD10CM/Codes/F01-F99/F01-F09/F01-/F01.5

https://bioportal.bioontology.org/ontologies/ICD10CM/?p=classes&conceptid=http%3A%2F%2Fpurl.bioontology.org%2Fontology%2FICD10CM%2FF01.5

This is a problem with the ICDO10CM OWL file we download. But it does not seem to be an issue on their site, so needs looking at.