Migration: clearly distinguish which disease ID a specific Mondo ID _was sourced_ from

monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources

https://monarch-initiative.github.io/mondo-ingest/

6 stars 3 forks source link

Migration: clearly distinguish which disease ID a specific Mondo ID _was sourced_ from #627

Open matentzn opened 3 months ago

matentzn commented 3 months ago

We should take a note when a disease is sourced from, say, an OMIM id. Right now, we dont know, if we have two "equivalenTo", which one is the "original one". They are the same. I think its good to know the "original one".

Represent as: metadata tag on xref, e.g. MONDO:originalSource.

joeflack4 commented 3 months ago

Hmm, interesting.

I don't know the details of this, but my immediate thought is: Is it really the case that 100% of the time, the first source that we add to a mondo disease is really that special / guaranteed to be a "central authority"? What if we later find that a disease we get from a source was not created by that source, and that the primary origin or authority of that disease actually came from another source?

Also these alternative solutions popped into my head: (i) MONDO:propertyAdded DATETIME (this would introduce a lot of axioms / text), (ii) MONDO:primarySource.

matentzn commented 3 months ago

Is it really the case that 100% of the time, the first source that we add to a mondo disease is really that special / guaranteed to be a "central authority"?

It is a good question, not easily answered, but I think regardless of the answer it is good to know the first source, and give it some precedence over subsequently mapped terms when it comes to determining the identity of a term!

twhetzel commented 3 months ago

I had the same question since the original source I believe will be influenced by the order that the lex files are reviewed as things currently stand.

As far as the representation in Mondo, @matentzn did you have more thoughts on this? For example, it's already confusing to know which ID equivalentTo refers to in an xref when there is more than one CURIE present. For example for MONDO:0000179 and xref: Orphanet:2671 {source="GARD:0000102", source="MONDO:equivalentTo", source="OMIM:256520"}.

matentzn commented 3 months ago

xref: Orphanet:2671 {source="GARD:0000102", source="MONDO:equivalentTo", source="OMIM:256520"}

This means that Orphanet:2671 is equivalent, and GARD:0000102 and OMIM:256520 are giving evidence to that equivalence. If you add:

xref: Orphanet:2671 {source="GARD:0000102", source="MONDO:equivalentTo", source="OMIM:256520", source="MONDO:originalSource"}

It will state, in addition to the above, that Orphanet:2671 was the original term that gave rise to the existence of the term. So I would suggest we go in this direction?

joeflack4 commented 3 months ago

Hmm... but if you say that:

(1) xref: Orphanet:2671 {source="GARD:0000102", source="MONDO:equivalentTo", source="OMIM:256520"} means that these 2 terms are what "gives evidence for equivalence",

then it would seem to follow that: (2) xref: Orphanet:2671 {source="GARD:0000102", source="MONDO:equivalentTo", source="OMIM:256520", source="MONDO:originalSource"} means that these 2 terms both "give evidence for equivalence" and "give evidence for this being the MONDO:originalSource`"...

matentzn commented 3 months ago

For me original source has nothing to do with other provenance. It means, Orphanet:2671 was the original source for the mondo term. The other provenance tags just mean "these are also mapped to Orphanet:2671 and also happened to be equivalent to Mondo". But yeah when you have "original source" other evidence really does not matter that much. But does not hurt either!