monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

Synonym sync: acronym case exception #671

Closed joeflack4 closed 1 week ago

joeflack4 commented 2 months ago

Overview

Addressed an issue where we were preferring the Mondo capitalization even though it was incorrect, i.e. cases where the synonym was an acronym and capitalized in the source, but was not capitalized in Mondo.

Pre-merge checklist

Documentation

Was the documentation added/updated under docs/?

QC

Was the full pipeline run before submitting this PR using sh run.sh make build-mondo-ingest on this branch (after docker pull obolibrary/odkfull:dev), and no errors occurred?

Build:

New Packages

Were any new Python packages added?

Were any other non-Python packages added?

PR Review and Conversations Resolved

Has the PR been sufficiently reviewed by at least 1 team member of the Mondo Technical team and all threads resolved?

Additional information

Context

@twhetzel and I discussed this at our last 1:1. Sabrina had done some recent curation and noticed that sometimes we would actually prefer the source's capitalization rather than Mondo's. I looked at the google sheet, at all of the values where Use Source Case (Curator Review) == Source, and saw that these were all acronyms. They were all cases where the source was all caps and Mondo was not.

Results

I ran a before/after, using DO as my test case, and looked at the differences in the outputs. There were differences in the doid.synonyms.confirmed.robot.tsv and doid.synonyms.updated.robot.tsv. I examined and the outputs are as I expected. Here are the diffs (FYI I added column headers at the top):