monarch-initiative / mondo

Mondo Disease Ontology
Creative Commons Attribution 4.0 International
223 stars 51 forks source link

SOP: what to do with x-ref that are not equivalent? #7815

Open sabrinatoro opened 1 week ago

sabrinatoro commented 1 week ago

We reviewing x-refs, some are found to not be EquivalentTo. What should we do with these? Should we keep these x-ref and change the source to "NOT equivalent to"? OR should we delete them?

sabrinatoro commented 1 week ago

@matentzn a penny for your thoughts!

matentzn commented 1 week ago
  1. We absolutely should document false positive mappings. These are very valuable resources for lexical matching efforts, and they help us with weeding out false automated matches
  2. I very strongly do not think these false positive mappings should be shipped in Mondo. People tend to not look at axiom annotations. I think we should keep them alongside our "exclusion reasons" in a table, because mondo-ingest, where the exclusions are curated, is also where we need the exclusions for processing.

My suggestion for now would be to create a google sheet, and just add excluded xrefs / mappings to it, and pull this spreadsheet into mondo-ingest during a data release.

matentzn commented 1 week ago

I made a start:

sabrinatoro commented 1 week ago

I agree completely with you, @matentzn

We should create SOP on how to create and update that spreadsheet. Brain dump below (sorry about the dumping :-) I just want to add them before I forget)

matentzn commented 1 week ago

current version of Mondo has a few x-refs without source --> these should all be automatically added to the spreadsheet?

I thought we only wanted to add wrong xrefs, not xrefs where we dont know if they are wrong.. For that, we need a different table. I read the issue a bit different, I read it as "what to do with known non-exact xrefs". I think what you are asking for needs a bit of a deeper discussion, mainly:

  1. Are there circumstances under which a term is (a) out of scope in Mondo because it is too fine grained (namely heavily pre-coordinated classes in ICD 10) and (b) we still want to make sure its mapped into Mondo using a "MONDO-[narrow]->ICD" match?
  2. Are there circumstances we prefer to keep xrefs with unknown precision, e.g. Mesh, because "a rough match is better than no match"?

which x-refs should be added to the spreadsheet? All the xrefs that, at one point, were x-ref on the term?(e.g OMIM x-ref is changed to OMIM-PS; Sabrina can give more details when we discuss)

This is another question we need to answer, I personally do not think we need to keep these, but there is a good chance I am wrong.

how do we separate the manually curated versus the automatically maintained x-ref?

Slightly different issue, but I am thinking about this furiously! I am adding a discussion point to the Tech call.