monarch-initiative / omim

Data ingest pipeline for OMIM.
7 stars 3 forks source link

`, INCLUDED`: Non-exact synonyms being labeled as exact #116

Closed joeflack4 closed 1 month ago

joeflack4 commented 1 month ago

Overview

We are parsing out 100% of "other entities/titles" as oboInOwl:hasExactSynonym, but it is not always the case, or perhaps is never the case, that they are exact.

Example: USHER SYNDROME, TYPE I; USH1 (OMIM:276900)

OMIM entry pages have a section "Other entities represented in this entry". Those entries also appear in mimTitles.txt in the column Included Title(s). In this example OMIM:276900, one of those is: USH1B, INCLUDED

Clearly, "USH1B" is narrower than "USH1". So this should not be imported as an exact synonym.

Possible solutions

Temporarily: a. For now, don't import all of ", INCLUDED titles" / other entries effectively dropping them from omim.owl. b. For now, import all of ", INCLUDED titles" / other entries as oboInOwl:hasRelatedMatch c. Keep them in omim.owl, but for purposes of synonym sync, handle separately

Eventually: Through curation, term by term, synonym by synonym, override the temporary solution and import synonyms with their correct scope (e.g. I think "narrow" as in the case above).

Additional info

mondo-edit.obo uses a tag marking some "included" entries

I don't know how good the coverage of `MONDO:includedEntryInOMIM` is among these terms, but not all cases are covered (e.g. USH1) ```obo id: MONDO:0016008 name: fetal hydantoin syndrome synonym: "FHS" EXACT [OMIM:617955] xref: OMIM:617955 {source="MONDO:includedEntryInOMIM"} ```

OMIM:276900 in mondo-edit.obo & omim.owl

```obo [Term] id: MONDO:0010168 name: Usher syndrome type 1 def: "A syndrome characterized by congenital, bilateral, severe sensorineural hearing loss, abnormalities in the vestibular system, and adolescent-onset retinitis pigmentosa." [NCIT:C126327] subset: clingen {source="MONDO:CLINGEN"} subset: gard_rare {source="GARD:5435", source="MONDO:GARD"} subset: nord_rare {source="MONDO:NORD"} subset: ordo_subtype_of_a_disorder {source="Orphanet:231169"} subset: otar {source="MONDO:OTAR"} subset: rare synonym: "retinitis pigmentosa and congenital deafness" EXACT [OMIM:276900] synonym: "US1" EXACT ABBREVIATION [DOID:0110826, OMIM:276900] synonym: "USH1" EXACT ABBREVIATION [DOID:0110826, MONDO:Lexical, OMIM:276900, Orphanet:231169] synonym: "USH1A" RELATED ABBREVIATION [GARD:0005435] synonym: "Usher syndrome type 1" EXACT CLINGEN_LABEL [] synonym: "Usher syndrome, type 1" EXACT [GARD:0005435, OMIM:276900] synonym: "Usher syndrome, type 1A" RELATED [GARD:0005435] synonym: "Usher syndrome, type 1B" RELATED [OMIM:276900] synonym: "USHER syndrome, type I" RELATED [MONDO:Lexical, OMIM:276900] synonym: "Usher syndrome, type I, French variety" RELATED [OMIM:276900] synonym: "Usher syndrome, type I, French variety, formerly" RELATED [OMIM:276900] synonym: "Usher syndrome, type Ia" RELATED [OMIM:276900] synonym: "Usher syndrome, type Ia, formerly" RELATED [OMIM:276900] xref: DOID:0110826 {source="MONDO:equivalentTo"} xref: GARD:5435 {source="MONDO:GARD"} xref: ICD10CM:H35.5 {source="MONDO:relatedTo", source="DOID:0110826", source="Orphanet:231169", source="Orphanet:231169/attributed", source="Orphanet:231169/ntbt"} xref: MEDGEN:292820 {source="MONDO:equivalentTo", source="MONDO:MEDGEN"} xref: NANDO:1200942 {source="MONDO:NANDO", source="https://orcid.org/0000-0003-0011-764X", source="https://orcid.org/0000-0002-0170-9172"} xref: NCIT:C126327 {source="MONDO:equivalentTo"} xref: Orphanet:231169 {source="DOID:0110826", source="MONDO:equivalentTo", source="OMIM:276900"} xref: Orphanet:886 {source="OMIM:276900"} xref: SCTID:232057003 {source="MONDO:equivalentTo"} xref: UMLS:C1568247 {source="MEDGEN:292820", source="MONDO:equivalentTo", source="MONDO:MEDGEN"} is_a: MONDO:0019501 {source="DC-OMIM:276900", source="DOID:0110826", source="NCIT:C126327", source="OMIM:276900", source="Orphanet:231169"} ! Usher syndrome relationship: has_characteristic HP:0000007 {source="MONDO:HPOA", source="OMIM:276900", source="Orphanet:231169"} ! Autosomal recessive inheritance [Term] id: MONDO:0700087 name: Usher syndrome type 1B def: "Usher syndrome in which the cause of the disease is a mutation in the MYO7A gene" [MONDO:patterns/disease_series_by_gene] subset: gard_rare {source="GARD:5436", source="MONDO:GARD"} subset: nord_rare {source="MONDO:NORD"} subset: otar {source="MONDO:OTAR"} subset: rare synonym: "Usher syndrome, type 1B" EXACT [OMIM:276900, OMIM:genemap2] xref: GARD:5436 {source="MONDO:GARD"} xref: MEDGEN:419358 {source="MONDO:equivalentTo", source="MONDO:MEDGEN"} xref: MESH:C536485 {source="MONDO:equivalentTo"} xref: OMIM:276900 {source="MONDO:equivalentTo"} xref: UMLS:C2931206 {source="MONDO:equivalentTo", source="MONDO:MEDGEN", source="MEDGEN:419358"} is_a: MONDO:0010168 {source="Orphanet:231169"} ! Usher syndrome type 1 intersection_of: MONDO:0019501 ! Usher syndrome intersection_of: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/7606 ! MYO7A relationship: has_material_basis_in_germline_mutation_in http://identifiers.org/hgnc/7606 {source="OMIM:276900"} ! MYO7A property_value: IAO:0000233 "https://github.com/monarch-initiative/mondo/issues/4521" xsd:anyURI ``` ```owl USH1 retinitis pigmentosa and congenital deafness us1 usher syndrome, type 1 usher syndrome, type 1b usher syndrome, type i, french variety, formerly usher syndrome, type ia, formerly This term has one or more labels that end with ', INCLUDED'. usher syndrome, type 1 ```

joeflack4 commented 1 month ago

@sabrinatoro Now I understand what you were trying to tell me about these , INCLUDED entries during our meeting with Trish.

@twhetzel @matentzn Until we address this, we're going to be getting a lot of bad synonyms coming in from OMIM in the synonym sync. Right now, what's happening in the omim pipeline is that it's, for example, taking the title/entry USH1B, INCLUDED, and stripping the , INCLUDED part, and declaring USH1B as an oboInOwl:hasExactSynonym.

Which temporary / eventual solution do you guys prefer? Or can you think of better ones than I proposed?

joeflack4 commented 1 month ago

I have written in my notes that Sabrina effectively chose (a). I wrote that she told me:

Exclude: OMIM 'included in' cases sometimes we have 'included in'. sometimes they have their own MONDO ID. should not be added as synonyms

This is something I can do, but it should be done in the omim repo, not the mondo-ingest repo. I don't really have a choice, as the , INCLUDED bit has been stripped, so in the mondo-ingest synonym sync pipeline, I won't be able to tell which synonyms are which.

matentzn commented 1 month ago

I have no opinion but I agree this is a priority

joeflack4 commented 1 month ago

Trish decided: We will query Mondo for any mapped terms that have MONDO:includedEntryInOMIM, and filter / skip synonym synchronization for those.

twhetzel commented 1 month ago

After talking about this with Joe, Nicole pointed me to this open ticket from 2022 where you all worked on this a bit to make sure that Mondo correctly represented these INCLUDED entries in Mondo.

For the Synonym sync pipeline, these INCLUDED entries can be identified in the omim.owl file based on the comment field that was added as part of #5507.

Overall, for how to generally handle this needs some discussion amongst at least Nico, Joe, Trish and maybe Kevin and/curators so let's discuss on a Tech call.

If needed, the Synonym Sync notes are here.

joeflack4 commented 1 month ago

TLDR: I don't like our design / current plan. It's not clear and is lossy.


I skimmed it a bit and it looks like the main idea was to create the MONDO:includedEntryInOMIM axiom on any Mondo classes that are mapped to an OMIM term which have any INCLUDED entries. I did implement the rdfs:comment which has text "This term has one or more labels that end with ', INCLUDED'.", but note that it does not list any of those titles/synonyms in the comment. It just states that they exist.

I just looked at several examples in Mondo, and it looks like the coverage on these INCLUDED synonyms is inconsistent. Some of these synonyms have been added and some have not. I think that some are incorrectly marked as exacts synonyms.

I don't actually like this design that we came up with. I think the appropriate solution, both for Mondo in general and for this synchronization work, is to mark specifically which synonyms are of type INCLUDED.

If we skip synonym sync for cases of MONDO:includedEntryInOMIM, we also might skip non-INCLUDED synonyms. Observe the following case:

id: MONDO:0010918
synonym: "EIG" BROAD ABBREVIATION [MONDO:Lexical, OMIM:600669]
synonym: "EIG1" EXACT [OMIM:600669]
synonym: "epilepsy, idiopathic generalized, susceptibility to, 1" EXACT [OMIM:600669]
xref: OMIM:600669 {source="MONDO:includedEntryInOMIM"}

In this case, "EIG" is not INCLUDED, and the other two are.

Note though that this case does not actually appear in mondo.sssom.tsv since this is a broadMatch and only exact matches appear in it.