Addressed an issue where we were preferring the Mondo capitalization even though it was incorrect, i.e. cases where the synonym was an acronym and capitalized in the source, but was not capitalized in Mondo.
Pre-merge checklist
Documentation
Was the documentation added/updated under docs/?
[ ] Yes
[x] No, updates to the docs were not necessary after careful consideration
QC
Was the full pipeline run before submitting this PR using sh run.sh make build-mondo-ingest on this branch (after
docker pull obolibrary/odkfull:dev), and no errors occurred?
[x] Yes
[ ] No, there are no functional (code-related) changes to the pipeline in the PR, so no re-run is necessary
Has the PR been sufficiently reviewed by at least 1 team member of the Mondo Technical team and all threads resolved?
[x] Yes
Additional information
Context
@twhetzel and I discussed this at our last 1:1. Sabrina had done some recent curation and noticed that sometimes we would actually prefer the source's capitalization rather than Mondo's. I looked at the google sheet, at all of the values where Use Source Case (Curator Review) == Source, and saw that these were all acronyms. They were all cases where the source was all caps and Mondo was not.
Results
I ran a before/after, using DO as my test case, and looked at the differences in the outputs. There were differences in the doid.synonyms.confirmed.robot.tsv and doid.synonyms.updated.robot.tsv. I examined and the outputs are as I expected. Here are the diffs (FYI I added column headers at the top):
Overview
Addressed an issue where we were preferring the Mondo capitalization even though it was incorrect, i.e. cases where the synonym was an acronym and capitalized in the source, but was not capitalized in Mondo.
Pre-merge checklist
Documentation
Was the documentation added/updated under
docs/
?QC
Was the full pipeline run before submitting this PR using
sh run.sh make build-mondo-ingest
on this branch (afterdocker pull obolibrary/odkfull:dev
), and no errors occurred?Build:
677
New Packages
Were any new Python packages added?
Were any other non-Python packages added?
PR Review and Conversations Resolved
Has the PR been sufficiently reviewed by at least 1 team member of the Mondo Technical team and all threads resolved?
Additional information
Context
@twhetzel and I discussed this at our last 1:1. Sabrina had done some recent curation and noticed that sometimes we would actually prefer the source's capitalization rather than Mondo's. I looked at the google sheet, at all of the values where
Use Source Case (Curator Review)
==Source
, and saw that these were all acronyms. They were all cases where the source was all caps and Mondo was not.Results
I ran a before/after, using DO as my test case, and looked at the differences in the outputs. There were differences in the
doid.synonyms.confirmed.robot.tsv
anddoid.synonyms.updated.robot.tsv
. I examined and the outputs are as I expected. Here are the diffs (FYI I added column headers at the top):