monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

Bug: Synonym Sync: Duplicate rows (case diff) #684

Open joeflack4 opened 2 weeks ago

joeflack4 commented 2 weeks ago

Overview

I found that there are duplicate rows entering into the -confirmed template, while examining Nico's mondo PR for the confirmed cases template:

I do not know if this bug exists for the other ROBOT templates.

The bug does not have a negative consequence on processing; confirmed by checking the outputs for #8269.

The bug seems to manifest itself in that there can sometimes be multiple rows when there is a case difference between Mondo and the source. And it doesn't happen all of the time.

Example case: mondo_id mondo_label synonym_scope synonym source_id synonym_case_diff_mondo synonym_case_diff_source
MONDO:0000001 disease oio:hasExactSynonym disease NCIT:C2991
MONDO:0000001 disease oio:hasExactSynonym disease NCIT:C2991 disease Disease