monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

Mapping: Mappings are suggested to Mondo term which already have a exact mapping to another term of the same source #260

Open sabrinatoro opened 1 year ago

sabrinatoro commented 1 year ago

From unmapped_doid_lex.tsv (230930)

Some DO terms are mapped to a Mondo term even though the Mondo term already has an exact match to another DO term. This is probably due to a difference in lumping/splitting in DO and Mondo. I don't know that we can do something about this, but the 2 options would be: 1) do not map to MONDO terms that already have a DO equivalent 2) report the Mondo terms that are in this new mapping file but already have a DO equivalent (that will help curators with the review)

At the end of the day, I think this is an indication that a curator should review the synonyms and the existing mapping. This is not a problem with the mapping itself, just an issue that is data related and therefore requires curator attention (and if we could make it easier to bring curator attention to this, that would be helpful).

Examples:

subject_id subject_label object_id predicate_id object_label mapping_justification mapping_tool confidence subject_match_field object_match_field match_string
MONDO:0008048 autosomal dominant centronuclear myopathy DOID:0111223 MONDO:equivalentTo centronuclear myopathy 1 semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym rdfs:label centronuclear myopathy type 1
MONDO:0007648 hereditary diffuse gastric adenocarcinoma DOID:0080764 MONDO:equivalentTo hereditary diffuse gastric cancer semapv:LexicalMatching oaklib 0.8 oio:hasExactSynonym rdfs:label hereditary diffuse gastric cancer
MONDO:0009959 peroxisome biogenesis disorder type 3B DOID:0081241 MONDO:equivalentTo peroxisome biogenesis disorder 3B semapv:LexicalMatching oaklib 0.8497788952 rdfs:label rdfs:label peroxisome biogenesis disorder type 3b
matentzn commented 1 year ago

Excellent analysis. Lets discuss on QC call!