mapping-commons / sssom-py

Python toolkit for SSSOM mapping format
https://mapping-commons.github.io/sssom-py/index.html#
MIT License
49 stars 12 forks source link

ICD10WHO does not work in `sssom parse` #260

Closed matentzn closed 2 years ago

matentzn commented 2 years ago

Reproducible example:

wget "https://www.dropbox.com/s/8sx79nki2h4pmuk/mirror-mondo.json?dl=0" -O mirror-mondo.json
wget "https://raw.githubusercontent.com/monarch-initiative/mondo/master/src/ontology/metadata/mondo.sssom.config.yml" -O mondo.sssom.config.yml
sssom parse mirror-mondo.json -I obographs-json -m mondo.sssom.config.yml -o test.sssom.tsv
grep -o 'ICD10WHO' mirror-mondo.json | wc -l
grep -o 'ICD10CM' mirror-mondo.json | wc -l
grep -o 'ICD10WHO' test.sssom.tsv | wc -l
grep -o 'ICD10CM' test.sssom.tsv | wc -l

Output:

grep -o 'ICD10WHO' mirror-mondo.json | wc -l
      18
grep -o 'ICD10CM' mirror-mondo.json | wc -l
    8351
grep -o 'ICD10WHO' test.sssom.tsv | wc -l
    7412
grep -o 'ICD10CM' test.sssom.tsv | wc -l
    8346

Clearly sssom py made up about 7400 mappings between mondo and icd10who..

matentzn commented 2 years ago

Ping @hrshdhgd getting very urgent! :) Thanks!

hrshdhgd commented 2 years ago

I am looking into this now. So for xref_id = 'http://apps.who.int/classifications/icd10/browse/2010/en#/Q04.3, curie_from_uri(xref_id, prefix_map) returns ICD10WHO:Q04.3. Is this correct? If so, then this is one of the many examples where ICD10WHO:XXXX are getting mapped. Hence the numbers don't seem to add up.

matentzn commented 2 years ago

Wait http://apps.who.int/classifications/icd10/browse/2010/en#/Q04.3 exists in verbatim in the source mondo.json?

hrshdhgd commented 2 years ago

Yes

"basicPropertyValues" : [ {
          "pred" : "http://www.w3.org/2004/02/skos/core#exactMatch",
          "val" : "http://www.orpha.net/ORDO/Orphanet_137831"
        }, {
          "pred" : "http://www.w3.org/2000/01/rdf-schema#seeAlso",
          "val" : "https://rarediseases.info.nih.gov/diseases/9947/mental-retardation-x-linked-with-cerebellar-hypoplasia-and-distinctive-facial-appearance"
        }, {
          "pred" : "http://www.w3.org/2004/02/skos/core#exactMatch",
          "val" : "http://purl.obolibrary.org/obo/DOID_0080311"
        }, {
          "pred" : "http://www.w3.org/2004/02/skos/core#exactMatch",
          "val" : "http://identifiers.org/mesh/C537456"
        }, {
          "pred" : "http://www.w3.org/2004/02/skos/core#exactMatch",
          "val" : "http://identifiers.org/snomedct/719136005"
        }, {
          "pred" : "http://www.w3.org/2000/01/rdf-schema#seeAlso",
          "val" : "https://github.com/monarch-initiative/mondo/issues/4521"
        }, {
          "pred" : "http://www.w3.org/2004/02/skos/core#exactMatch",
          "val" : "https://omim.org/entry/300486"
        }, {
          "pred" : "http://www.w3.org/2004/02/skos/core#narrowMatch",
          "val" : "http://apps.who.int/classifications/icd10/browse/2010/en#/Q04.3"
        } ]
      },

on line 218581 in the JSON file. Last line here.

matentzn commented 2 years ago

Not a sssom py issue.