Closed joeflack4 closed 1 month ago
@joeflack4 the original content from OMIM should be parsed so that what is labeled as a symbol by OMIM becomes tagged as an abbreviation in the omim.owl file that is created. The capitalization here (of the OMIM content) does not matter for tagging something as an abbreviation.
I'm fairly confident that this is a bug in the code, based on what I've seen in just logically. I think I know where it is. And it may only happen when there are multiple symbols.
But I think it does matter, because this will make its way into mondo. We will have things that are marked as abbreviations otherwise do not pass our normal criteria. That is, it will be lowercase.
Unless I am misunderstanding the importance of this criteria. Is it important that we have it consistent? That we make sure that 100% of our synonyms marked as abbreviations are indeed uppercase, etc?
If that's not important, then we can close this issue.
If we do want to reach 100%, some of this can be dealt with by making sure the pipelines are correct, but I'm sure some of this will be on the curation side as well, assuming there are currently some synonyms in Mondo already that have this problem.
In mimTitles.txt the "symbol" value from the OMIM column "Preferred Title; symbol" should be parsed out from that column and the "symbol" value should not have any changes in it's capitalization as it's found in the file.
That sounds correct to me as well.
I'll add this bug fix then to #128 or I can make a separate PR for that first if you want.
Going with option (b), not using these case-changing function on these symbols at all.
resolved by #130
Overview
In working on #119, I noticed that some of the the
mondo#abbreviation
we're adding (these are actually symbols) are lowercase.This is not expected.
mondo#abbreviation
s should all be uppercase. And also I would expect that symbols always be uppercase too, as I learned from Trish. And a cursory search seems to corroborate that.The source of this bug in the code appears to be
cleanup_label()
and_detect_abbreviations()
.I think it could be that (a) there's bugs in these functions, or (b) these functions simply should not be applied to symbols--at least not the ones on "Preferred title; symbol", and are only supposed to be used on other labels or parts of labels.
Examples
OMIM:126370 - hs3
Sub-tasks
cleanup_label()
?