Open joeflack4 opened 2 years ago
What if a linked term in upstream source doesn't add a new synonym, but (i) existing synonym label changes (i guess we'd consider this as an addition of a new synonym, and a deletion of an old synonym), (ii) synonym type changes (e.g. exact -> broad), (iii) or is removed?
For now, we don't remove content at all, but the correct answer is this:
I think if we do everything correctly in Mondo, all axioms and annotations should have some support from some external ontology. So we can have a separate processes targeting all axioms without support at a later stage.
From meeting w/ Nico: We could output two files ultimately: (i) for all the axioms that need to be removed, and (ii) one for all that need to be added. By support, means that there is an xref. No support means synonym is literal text. The robot template for synonym slurp has a column with ID of the source concept, e.g. [OMIM:123]
.
When checking to see if label has changed, it may not be the case that the label is listed as a synonym in Mondo (IDK if there's even a way to tell what the previous label was without checking it listed as a synonym for a specific source by querying Mondo). So if it is missing, I want to consider adding 2 synonyms: (i) the new label, (ii) the original label.
The goal I think is:
For every source:
For every term in source:
For every synonym or label on term:
Make sure it exists as a synonym for the Mondo term that is mapped to that source term.
Also, do we delete synonyms during the sync process under either of these conditions?
The specification for our migration script should be always the same:
For every source:
For every term in source:
For every synonym or label on term:
Add row to robot template with correct evidence (with Mondo ID of course)
On the Mondo side, we then, in a separate pipeline:
Here is the very special problem for synonym sync, which we need to discuss in internally between all of us.
If there is an synonym "X" of type "T" (exact, broad, related) for and id "MONDO:Y", and Mondo already contains a synonym "X" on "MONDO:Y" with a type "T2", where T!=T2, then we should exclude the synonym.
I dont know exactly whether this should be done at mondo ingest level, or at load time into Mondo. I have a mild preference to do this on the Mondo, not Mondo Ingest, side of things, to avoid syncing issues where a synonym type in Mondo was changed in between the last mondo ingest run and the date of adding the new synonyms.
Very helpful.
I prefer also doing that mapping predicate conflict resolution in mondo
rather than mondo-ingest
. It seems like we've established a precedent of doing such conflict resolutions in mondo
, usually by curation, though in this case I suppose it might be automated.
Config file? It might help to also have a config file listing out all of the confirmed cases; cases where we've looked at the conflict and confirm that Mondo's type is correct and overrides what the source is showing. I asked for something very similar in this PR:
probably need a file which lists all of Mondo's proxy merges so that I can have the pipeline read it and filter such suggestions out.
Discussion and design moving here: https://docs.google.com/document/d/1xdSROYkk16qoQY0jGQap6ihNouG04OLH_jE3uAMpU20/edit
Overview
Whenever new synonym or label in upstream source, want to add it to Mondo. If the label is not the Mondo label or existing synonym, it will become a synonym.
Sub-tasks
How
If synonym is not already in Mondo, add it into a
sync/synonyms.robot.template.tsv
.Related
87
27