Closed joeflack4 closed 5 days ago
Notes to self:
There shouldn't be exact_labels
. I suppose exact
means is referring to exactSynonym
? There is already pref_label
(also renamed to label
). Should just use that and extract all abbreviations from it at once.
resolved by #130
Overview
In working on #119, I noticed that the code for handling abbreviations could use a little work. There are not many cases of multiple symbols / abbreviations. In fact, I've just found 2 cases:
OMIM:126370: DNA, SATELLITE, III; HS3; D1Z1 OMIM:171820: PHOSPHATASE, SALIVARY ACID, A; SACP; ACPS
I'm not sure how many issues are being caused by improper handling of multiple abbreviations, but there appears to be at least some...
Issue 1:
rdfs:label
It adds the
rdfs:label
for one of the abbreviations. And apparently, given the way the code works (i.e. setting abbrev as primary label for genes), this only affectsOMIM:171820
.Issue 2: Synonyms & "modified included label"
There should be an outer
for
loop here over list of abbreviations. There are these 3 triples for synonyms, and another one for a "modified included label" (not sure why we're creating this). https://github.com/monarch-initiative/omim/blob/dc3c79a5606a495cd7a08623ed4ac17c234d0575/omim2obo/main.py#L207-L215I don't think there should be
abbrev
andabbr
. There should just beabbreviations
.Questions
;
are simply allowed within a symbol? I say this because the header ofmimTitles.txt
includes fields such asPreferred Title; symbol
,Alternative Title(s); symbol(s)
,Included Title(s); symbols
. This indicates that multiple symbols can happen for alternative or included titles, but not preferred titles. However, I think this is an oversight, and the field name should be "Preferred Title; symbol(s)
".