Open nicolevasilevsky opened 2 years ago
@hrshdhgd you do not have to actually do anything specific here, but you should understand how our work with boomer and disease mapping commons will automagically solve this issue.
Action item: @joeflack4 take Orphanet - OMIM mapping that has been created, and compare it with the Mondo - orphanet mapping. Ie need to compare all the mappings where there is a OMIM mapped to an Orphanet (ORDO) term but the Orphanet term is not mapped to Mondo
The goal is to determine where we don't have existing Orphanet mappings in Mondo and slurp into Mondo.
(We can talk about this on the QC call)
Extract MONDO:OMIM (rename OMIM to MONDO_OMIM) OUTER JOIN MONDO:ORDO (on MONDO) OUTER JOIN ORDO:OMIM (on ORDO) (rename OMIM to ORDO_OMIM)
@joeflack4 So we have concluded that we can obtain no new MONDO-ORDO mappings from the OMIM-ORDO mappings, right? If so, we can close this.
@matentzn Oh no, that is incorrect. mondo-orphanet-omim_mappings - OMIM-ORDO mappings not in Mondo.csv
I was able to locate such new mappings. Details regarding that are in my mondo-analysis
PR: https://github.com/monarch-initiative/mondo-analysis/pull/36
How can I get new MONDO-Orphanet mappings from this table? (Mappings that were not previously there)?
I sorted it so that the new mappings are at the top of the list. For example:
OMIM_id | Mondo_id | Orphanet_id_fromOrphanet | Orphanet_id_fromMondo | ofInterest |
---|---|---|---|---|
OMIM:601410 | MONDO:0011073 | Orphanet:99886 | TRUE | |
OMIM:606176 | MONDO:0100165 | Orphanet:99885 | TRUE |
There are ~200 instances of these. I think this is what you were asking me to do yesterday, correct? Unless my original mapping file was indeed too out of date and these Mondo::Orphanet mappings are already in Mondo, these should be new mappings.
Awesome - he last thing we need and the this is done from your perspective is the labels (mondo and orphanet) in this table - else it will take quite long for Nicole to review them. Any chance we can make that happen?
Yep, I can make that happen. I was going to try and squeeze that in today, but I wasn't sure how important that was. Sounds like it's important enough. I'll ditch my oak
-wrangling for now and do it the SPARQL way this time. So yeah, I should have that uploaded to my PR as an updated CSV in a few hours. Guess I'll upload it here as well.
Here are the updated files w/ labels: mondo-orphanet-omim_mappings - OMIM-ORDO mappings not in Mondo - v2.csv mondo-orphanet-omim_mappings - v2.csv
Thank you Joe! Labels look awesome. Something is not quite as I expected: I am looking for example at this row:
OMIM:617396 | MONDO:0054561 | anauxetic dysplasia 2 | Orphanet:93347 | Anauxetic dysplasia |
---|
It is true that this link between MONDO and Orphanet does not exist currently - but the Orphanet class is already linked to MONDO:0011773!
So what we need is:
A list of all MONDO ids that are not linked to Orphanet joined, on OMIM, with A link of all Orphanet ids that are not linked with MONDO.
I expect a much smaller number of classes tbh, if any!
@matentzn Hey Nico, I think you might be confused. If not, then I am confused about something and having a hard time understanding, sorry.
I looked at the example you gave, and this is one of them for which Mondo does not have an equivalence link to any Orphanet class.
I included some more examples in the table below.
I also went to check your example, MONDO:0009277
, in mondo.owl
. What I found is that there is an xref between this class and Orphanet classes. However, they are not of type skos:exactMatch
. They are of type oboInOwl:hasDbXref
. So I think mondo-orphanet-omim_mappings - OMIM-ORDO mappings not in Mondo - v2.csv is correct in that respect, because you asked me to only discard any mappings that were not skos:exactMatch
. Am I correct?
What I thought you were asking on the call was simply to remove all of the rows where this was not true (i.e. where ofInterest
is FALSE
). I can do that too. I also just noticed that I have duplicates rows. So my updates that I need to make are:
ofInterest
is FALSE
, and then I can also just delete the ofInterest
column.Please let me know if we are on the same page or if there is still another issue with this table that I am not understanding.
OMIM_id | Mondo_id | Mondo_label | Orphanet_id_fromOrphanet | Orphanet_id_fromMondo | Orphanet_label | ofInterest |
---|---|---|---|---|---|---|
OMIM:601410 | MONDO:0011073 | diabetes mellitus, transient neonatal, 1 | Orphanet:99886 | Transient neonatal diabetes mellitus | TRUE | |
OMIM:606176 | MONDO:0100165 | permanent neonatal diabetes mellitus 1 | Orphanet:99885 | Isolated permanent neonatal diabetes mellitus | TRUE | |
OMIM:618573 | MONDO:0032819 | hypothyroidism, congenital, nongoitrous, 7 | Orphanet:99832 | Resistance to thyrotropin-releasing hormone syndrome | TRUE | |
OMIM:608161 | MONDO:0024561 | vitelliform macular dystrophy 3 | Orphanet:99000 | Adult-onset foveomacular vitelliform dystrophy | TRUE | |
OMIM:231300 | MONDO:0009277 | glaucoma 3A | Orphanet:98976 | Congenital glaucoma | TRUE | |
OMIM:229300 | MONDO:0100340 | Friedreich ataxia 1 | Orphanet:95 | Friedreich ataxia | TRUE | |
OMIM:162091 | MONDO:0024517 | schwannomatosis 1 | Orphanet:93921 | Schwannomatosis | TRUE | |
OMIM:617396 | MONDO:0054561 | anauxetic dysplasia 2 | Orphanet:93347 | Anauxetic dysplasia | TRUE |
MONDO:0009277
from mondo.owl
, with only axioms of interest included <owl:Class rdf:about="http://purl.obolibrary.org/obo/MONDO_0009277">
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/MONDO_0020366"/>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0004020"/>
<owl:someValuesFrom rdf:resource="http://identifiers.org/hgnc/2597"/>
</owl:Restriction>
</rdfs:subClassOf>
<obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">An autosomal recessive form of congenital glaucoma caused by mutation(s) in the CYP1B1 gene, encoding cytochrome P450 1B1.</obo:IAO_0000115>
<mondo:excluded_subClassOf rdf:resource="http://purl.obolibrary.org/obo/MONDO_0006788"/>
<oboInOwl:hasBroadSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">buphthalmos</oboInOwl:hasBroadSynonym>
<oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">DOID:11211</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ICD9:743.21</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">NCIT:C148260</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">OMIM:231300</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Orphanet:98976</oboInOwl:hasDbXref>
<oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Orphanet:98977</oboInOwl:hasDbXref>
<oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Primary Congenital glaucoma 3A</oboInOwl:hasExactSynonym>
<oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">glaucoma 3, primary congenital, type a</oboInOwl:hasExactSynonym>
<oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">simple buphthalmos</oboInOwl:hasExactSynonym>
<oboInOwl:hasRelatedSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">GLC3A</oboInOwl:hasRelatedSynonym>
<oboInOwl:hasRelatedSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">glaucoma 3, primary congenital, A</oboInOwl:hasRelatedSynonym>
<oboInOwl:hasRelatedSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">glaucoma, congenital</oboInOwl:hasRelatedSynonym>
<oboInOwl:hasRelatedSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">glaucoma, primary open angle, adult-onset</oboInOwl:hasRelatedSynonym>
<oboInOwl:hasRelatedSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">glaucoma, primary open angle, juvenile-onset</oboInOwl:hasRelatedSynonym>
<oboInOwl:id rdf:datatype="http://www.w3.org/2001/XMLSchema#string">MONDO:0009277</oboInOwl:id>
<rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Editor note: check DO placement</rdfs:comment>
<rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">glaucoma 3A</rdfs:label>
<skos:exactMatch rdf:resource="http://purl.obolibrary.org/obo/DOID_11211"/>
<skos:exactMatch rdf:resource="http://purl.obolibrary.org/obo/NCIT_C148260"/>
<skos:exactMatch rdf:resource="https://omim.org/entry/231300"/>
</owl:Class>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/MONDO_0009277"/>
<owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
<owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Orphanet:98976</owl:annotatedTarget>
<oboInOwl:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string">MONDO:subClassOf</oboInOwl:source>
<oboInOwl:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string">OMIM:231300</oboInOwl:source>
</owl:Axiom>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/MONDO_0009277"/>
<owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
<owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Orphanet:98977</owl:annotatedTarget>
<oboInOwl:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string">MONDO:relatedTo</oboInOwl:source>
<oboInOwl:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string">MONDO:superClassOf</oboInOwl:source>
<oboInOwl:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string">OMIM:231300</oboInOwl:source>
</owl:Axiom>
@nicolevasilevsky can you check two of joes examples and see if they are new mappings as we would expect?
That was a fast response!
yes, this looks good to me @joeflack4
Yay! Thanks for checking.
OK, remaining action item, then close:
I created a google doc here. I will work on this today.
note to self, I emailed Orphanet about the mapping between Orphanet:95494 Combined pituitary hormone deficiencies, genetic forms and OMIM:613038 PITUITARY HORMONE DEFICIENCY, COMBINED OR ISOLATED, 1; CPHD1.
In OLS, it says these terms are an exact mapping, but it seems that different genes are implicated in each disease.
Response from Orphanet:
I have reviewed the mapping between Orphanet:95494 Combined pituitary hormone deficiencies, genetic forms and OMIM:613038 PITUITARY HORMONE DEFICIENCY, COMBINED OR ISOLATED, 1; CPHD1 and you are right this is not an Exact mapping but rather BTNT (ORPHA code's Broader Term maps to a Narrower Term). I will therefore proceed with correcting this error. Thank you very much for your insight!
Oh, nice catch! Let me know (here or at a meeting) if you think that there's something you think I need to do on the ingest side of things to handle this exception case.
@joeflack4 I added this to the QC call agenda. I have some questions about this that would probably be easiest to discuss on the call. Thanks!
@nicolevasilevsky - I have a question about the mapping of Orphanet:93921 to MONDO:0024517. The Orpha data points to multiple exact matches: UMLS:C1335929 (E) = "Schwannomatosis" (maps SNOMED CT 781641005, same string; parent to "Schwannomatosis 1" in UMLS hierarchy) MeSH:C536641 (E) = "Schwannomatosis" (it does have a syn. of the type 1 sub-type, but UMLS splits this code across multiple CUIs, the primary record is on UMLS: C1335929 ) UMLS:C2931480 (E)= "Neurofibromatosis, Type 3, mixed central and peripheral" UMLS:C0917817 (E)= "Neurofibromatosis 3" OMIM:162091 (E) = "SCHWANNOMATOSIS 1"
And the Orphanet data has BTNT for two additional OMIM records: OMIM:162260, OMIM:615670 Overall, as I read the data and based on string-matching, the OrphaID looks like a more appropriate match to MONDO:0008075 "neurofibromatosis type 3" (it's even a synonym on the OrphaID in question)
Similar question about the mapping of Orphanet:99000 to OMIM:608161 & MONDO:0024561. The Orpha record maps to multiple causative genes, but the MIM record is ONLY mapping to peripherin 2. I think the Orpha record should map to a broader concept, possibly: MONDO:0011979
nicolevasilevsky - I have a question about the mapping of Orphanet:93921 to MONDO:0024517. The Orpha data points to multiple exact matches: UMLS:C1335929 (E) = "Schwannomatosis" (maps SNOMED CT 781641005, same string; parent to "Schwannomatosis 1" in UMLS hierarchy) MeSH:C536641 (E) = "Schwannomatosis" (it does have a syn. of the type 1 sub-type, but UMLS splits this code across multiple CUIs, the primary record is on UMLS: C1335929 ) UMLS:C2931480 (E)= "Neurofibromatosis, Type 3, mixed central and peripheral" UMLS:C0917817 (E)= "Neurofibromatosis 3" OMIM:162091 (E) = "SCHWANNOMATOSIS 1"
And the Orphanet data has BTNT for two additional OMIM records: OMIM:162260, OMIM:615670 Overall, as I read the data and based on string-matching, the OrphaID looks like a more appropriate match to MONDO:0008075 "neurofibromatosis type 3" (it's even a synonym on the OrphaID in question)
99000
Similar question about the mapping of Orphanet:99000 to OMIM:608161 & MONDO:0024561. The Orpha record maps to multiple causative genes, but the MIM record is ONLY mapping to peripherin 2. I think the Orpha record should map to a broader concept, possibly: MONDO:0011979
I agree this mapping is not correct. However, I am unsure if Orphanet:99000 is an exact match with the OMIMPS (MONDO:0011979) because Orphanet includes 4 genes and OMIM includes 5. I'll bring this up on the Mondo curation call
I think this ticket need some strong sheperding. I will reassign it to you @nicolevasilevsky - I don't think it is the most important to deal with but if we can make slow amounts of progress of adding these mappings into Mondo, that would be great!
I have been working on this and will continue to do so, slowly, slowly :)
related https://github.com/monarch-initiative/mondo/issues/4579
There are ordo terms that need to be synched in Mondo. There are several terms that xref OMIM and are likely already in Mondo - this task is to automate the addition of Orphanet xfefs to existing Mondo terms.
ordo_slurp.txt