monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

Synchronization: Synonyms #88

Open joeflack4 opened 1 year ago

joeflack4 commented 1 year ago

Overview

Whenever new synonym or label in upstream source, want to add it to Mondo. If the label is not the Mondo label or existing synonym, it will become a synonym.

Sub-tasks

How

If synonym is not already in Mondo, add it into a sync/synonyms.robot.template.tsv.

Related

joeflack4 commented 1 year ago

What if a linked term in upstream source doesn't add a new synonym, but (i) existing synonym label changes (i guess we'd consider this as an addition of a new synonym, and a deletion of an old synonym), (ii) synonym type changes (e.g. exact -> broad), (iii) or is removed?

matentzn commented 1 year ago

For now, we don't remove content at all, but the correct answer is this:

I think if we do everything correctly in Mondo, all axioms and annotations should have some support from some external ontology. So we can have a separate processes targeting all axioms without support at a later stage.

joeflack4 commented 1 year ago

From meeting w/ Nico: We could output two files ultimately: (i) for all the axioms that need to be removed, and (ii) one for all that need to be added. By support, means that there is an xref. No support means synonym is literal text. The robot template for synonym slurp has a column with ID of the source concept, e.g. [OMIM:123].

joeflack4 commented 2 months ago

When checking to see if label has changed, it may not be the case that the label is listed as a synonym in Mondo (IDK if there's even a way to tell what the previous label was without checking it listed as a synonym for a specific source by querying Mondo). So if it is missing, I want to consider adding 2 synonyms: (i) the new label, (ii) the original label.

joeflack4 commented 2 months ago

The goal I think is:

For every source:
    For every term in source:
        For every synonym or label on term:
            Make sure it exists as a synonym for the Mondo term that is mapped to that source term.
joeflack4 commented 2 months ago

Also, do we delete synonyms during the sync process under either of these conditions?

matentzn commented 2 months ago

The specification for our migration script should be always the same:

For every source:
    For every term in source:
        For every synonym or label on term:
            Add row to robot template with correct evidence (with Mondo ID of course)

On the Mondo side, we then, in a separate pipeline:

  1. delete all synonym evidence (lets say, if we sync up DOID exact synonyms, we first drop the fact that synonym X is supported by DOID)
  2. merge in all new synonym evidence (including new synonyms, so if DOID in the above example still supports the synonym, the previously dropped evidence will be added back)

Here is the very special problem for synonym sync, which we need to discuss in internally between all of us.

If there is an synonym "X" of type "T" (exact, broad, related) for and id "MONDO:Y", and Mondo already contains a synonym "X" on "MONDO:Y" with a type "T2", where T!=T2, then we should exclude the synonym.

I dont know exactly whether this should be done at mondo ingest level, or at load time into Mondo. I have a mild preference to do this on the Mondo, not Mondo Ingest, side of things, to avoid syncing issues where a synonym type in Mondo was changed in between the last mondo ingest run and the date of adding the new synonyms.

joeflack4 commented 2 months ago

Very helpful.

I prefer also doing that mapping predicate conflict resolution in mondo rather than mondo-ingest. It seems like we've established a precedent of doing such conflict resolutions in mondo, usually by curation, though in this case I suppose it might be automated.

Config file? It might help to also have a config file listing out all of the confirmed cases; cases where we've looked at the conflict and confirm that Mondo's type is correct and overrides what the source is showing. I asked for something very similar in this PR:

joeflack4 commented 1 month ago

Discussion and design moving here: https://docs.google.com/document/d/1xdSROYkk16qoQY0jGQap6ihNouG04OLH_jE3uAMpU20/edit