monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

create candidate mappings for ORDO based on OMIM equiv xrefs #167

Open nicolevasilevsky opened 2 years ago

nicolevasilevsky commented 2 years ago

related https://github.com/monarch-initiative/mondo/issues/4579

There are ordo terms that need to be synched in Mondo. There are several terms that xref OMIM and are likely already in Mondo - this task is to automate the addition of Orphanet xfefs to existing Mondo terms.

ordo_slurp.txt

matentzn commented 2 years ago

@hrshdhgd you do not have to actually do anything specific here, but you should understand how our work with boomer and disease mapping commons will automagically solve this issue.

  1. Disease mapping commons contains Mondo mappings, but also includes mappings from other sources
  2. In particular in contains ORDO mappings (we should make sure these are exact and extracted correctly)
  3. The ORDO and Mondo mapping sets are both fed to boomer during the boomer run
  4. The resulting boomer output should contain all exact links between Mondo and ORDO terms of an intermediate OMIM was present.
matentzn commented 2 years ago

Requirement:

https://github.com/monarch-initiative/mondo-ingest/issues/23

nicolevasilevsky commented 2 years ago

Action item: @joeflack4 take Orphanet - OMIM mapping that has been created, and compare it with the Mondo - orphanet mapping. Ie need to compare all the mappings where there is a OMIM mapped to an Orphanet (ORDO) term but the Orphanet term is not mapped to Mondo

The goal is to determine where we don't have existing Orphanet mappings in Mondo and slurp into Mondo.

(We can talk about this on the QC call)

matentzn commented 2 years ago

Extract MONDO:OMIM (rename OMIM to MONDO_OMIM) OUTER JOIN MONDO:ORDO (on MONDO) OUTER JOIN ORDO:OMIM (on ORDO) (rename OMIM to ORDO_OMIM)

matentzn commented 2 years ago

@joeflack4 So we have concluded that we can obtain no new MONDO-ORDO mappings from the OMIM-ORDO mappings, right? If so, we can close this.

joeflack4 commented 2 years ago

@matentzn Oh no, that is incorrect. mondo-orphanet-omim_mappings - OMIM-ORDO mappings not in Mondo.csv

I was able to locate such new mappings. Details regarding that are in my mondo-analysis PR: https://github.com/monarch-initiative/mondo-analysis/pull/36

matentzn commented 2 years ago

How can I get new MONDO-Orphanet mappings from this table? (Mappings that were not previously there)?

joeflack4 commented 2 years ago

I sorted it so that the new mappings are at the top of the list. For example:

OMIM_id Mondo_id Orphanet_id_fromOrphanet Orphanet_id_fromMondo ofInterest
OMIM:601410 MONDO:0011073 Orphanet:99886 TRUE
OMIM:606176 MONDO:0100165 Orphanet:99885 TRUE

There are ~200 instances of these. I think this is what you were asking me to do yesterday, correct? Unless my original mapping file was indeed too out of date and these Mondo::Orphanet mappings are already in Mondo, these should be new mappings.

matentzn commented 2 years ago

Awesome - he last thing we need and the this is done from your perspective is the labels (mondo and orphanet) in this table - else it will take quite long for Nicole to review them. Any chance we can make that happen?

joeflack4 commented 2 years ago

Yep, I can make that happen. I was going to try and squeeze that in today, but I wasn't sure how important that was. Sounds like it's important enough. I'll ditch my oak-wrangling for now and do it the SPARQL way this time. So yeah, I should have that uploaded to my PR as an updated CSV in a few hours. Guess I'll upload it here as well.

joeflack4 commented 2 years ago

Here are the updated files w/ labels: mondo-orphanet-omim_mappings - OMIM-ORDO mappings not in Mondo - v2.csv mondo-orphanet-omim_mappings - v2.csv

matentzn commented 2 years ago

Thank you Joe! Labels look awesome. Something is not quite as I expected: I am looking for example at this row:

OMIM:617396 MONDO:0054561 anauxetic dysplasia 2 Orphanet:93347 Anauxetic dysplasia

It is true that this link between MONDO and Orphanet does not exist currently - but the Orphanet class is already linked to MONDO:0011773!

So what we need is:

A list of all MONDO ids that are not linked to Orphanet joined, on OMIM, with A link of all Orphanet ids that are not linked with MONDO.

I expect a much smaller number of classes tbh, if any!

joeflack4 commented 2 years ago

@matentzn Hey Nico, I think you might be confused. If not, then I am confused about something and having a hard time understanding, sorry.

I looked at the example you gave, and this is one of them for which Mondo does not have an equivalence link to any Orphanet class.

I included some more examples in the table below.

I also went to check your example, MONDO:0009277, in mondo.owl. What I found is that there is an xref between this class and Orphanet classes. However, they are not of type skos:exactMatch. They are of type oboInOwl:hasDbXref. So I think mondo-orphanet-omim_mappings - OMIM-ORDO mappings not in Mondo - v2.csv is correct in that respect, because you asked me to only discard any mappings that were not skos:exactMatch. Am I correct?

What I thought you were asking on the call was simply to remove all of the rows where this was not true (i.e. where ofInterest is FALSE). I can do that too. I also just noticed that I have duplicates rows. So my updates that I need to make are:

Please let me know if we are on the same page or if there is still another issue with this table that I am not understanding.

Filtered subset of CSV

OMIM_id Mondo_id Mondo_label Orphanet_id_fromOrphanet Orphanet_id_fromMondo Orphanet_label ofInterest
OMIM:601410 MONDO:0011073 diabetes mellitus, transient neonatal, 1 Orphanet:99886 Transient neonatal diabetes mellitus TRUE
OMIM:606176 MONDO:0100165 permanent neonatal diabetes mellitus 1 Orphanet:99885 Isolated permanent neonatal diabetes mellitus TRUE
OMIM:618573 MONDO:0032819 hypothyroidism, congenital, nongoitrous, 7 Orphanet:99832 Resistance to thyrotropin-releasing hormone syndrome TRUE
OMIM:608161 MONDO:0024561 vitelliform macular dystrophy 3 Orphanet:99000 Adult-onset foveomacular vitelliform dystrophy TRUE
OMIM:231300 MONDO:0009277 glaucoma 3A Orphanet:98976 Congenital glaucoma TRUE
OMIM:229300 MONDO:0100340 Friedreich ataxia 1 Orphanet:95 Friedreich ataxia TRUE
OMIM:162091 MONDO:0024517 schwannomatosis 1 Orphanet:93921 Schwannomatosis TRUE
OMIM:617396 MONDO:0054561 anauxetic dysplasia 2 Orphanet:93347 Anauxetic dysplasia TRUE

MONDO:0009277 from mondo.owl, with only axioms of interest included

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/MONDO_0009277">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/MONDO_0020366"/>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0004020"/>
                <owl:someValuesFrom rdf:resource="http://identifiers.org/hgnc/2597"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">An autosomal recessive form of congenital glaucoma caused by mutation(s) in the CYP1B1 gene, encoding cytochrome P450 1B1.</obo:IAO_0000115>
        <mondo:excluded_subClassOf rdf:resource="http://purl.obolibrary.org/obo/MONDO_0006788"/>
        <oboInOwl:hasBroadSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">buphthalmos</oboInOwl:hasBroadSynonym>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">DOID:11211</oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ICD9:743.21</oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">NCIT:C148260</oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">OMIM:231300</oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Orphanet:98976</oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Orphanet:98977</oboInOwl:hasDbXref>
        <oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Primary Congenital glaucoma 3A</oboInOwl:hasExactSynonym>
        <oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">glaucoma 3, primary congenital, type a</oboInOwl:hasExactSynonym>
        <oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">simple buphthalmos</oboInOwl:hasExactSynonym>
        <oboInOwl:hasRelatedSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">GLC3A</oboInOwl:hasRelatedSynonym>
        <oboInOwl:hasRelatedSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">glaucoma 3, primary congenital, A</oboInOwl:hasRelatedSynonym>
        <oboInOwl:hasRelatedSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">glaucoma, congenital</oboInOwl:hasRelatedSynonym>
        <oboInOwl:hasRelatedSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">glaucoma, primary open angle, adult-onset</oboInOwl:hasRelatedSynonym>
        <oboInOwl:hasRelatedSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">glaucoma, primary open angle, juvenile-onset</oboInOwl:hasRelatedSynonym>
        <oboInOwl:id rdf:datatype="http://www.w3.org/2001/XMLSchema#string">MONDO:0009277</oboInOwl:id>
        <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Editor note: check DO placement</rdfs:comment>
        <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">glaucoma 3A</rdfs:label>
        <skos:exactMatch rdf:resource="http://purl.obolibrary.org/obo/DOID_11211"/>
        <skos:exactMatch rdf:resource="http://purl.obolibrary.org/obo/NCIT_C148260"/>
        <skos:exactMatch rdf:resource="https://omim.org/entry/231300"/>
    </owl:Class>
    <owl:Axiom>
        <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/MONDO_0009277"/>
        <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
        <owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Orphanet:98976</owl:annotatedTarget>
        <oboInOwl:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string">MONDO:subClassOf</oboInOwl:source>
        <oboInOwl:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string">OMIM:231300</oboInOwl:source>
    </owl:Axiom>
    <owl:Axiom>
        <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/MONDO_0009277"/>
        <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
        <owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Orphanet:98977</owl:annotatedTarget>
        <oboInOwl:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string">MONDO:relatedTo</oboInOwl:source>
        <oboInOwl:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string">MONDO:superClassOf</oboInOwl:source>
        <oboInOwl:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string">OMIM:231300</oboInOwl:source>
    </owl:Axiom>
matentzn commented 2 years ago

@nicolevasilevsky can you check two of joes examples and see if they are new mappings as we would expect?

joeflack4 commented 2 years ago

That was a fast response!

nicolevasilevsky commented 2 years ago

yes, this looks good to me @joeflack4

joeflack4 commented 2 years ago

Yay! Thanks for checking.

matentzn commented 2 years ago

OK, remaining action item, then close:

nicolevasilevsky commented 2 years ago

I created a google doc here. I will work on this today.

nicolevasilevsky commented 2 years ago

note to self, I emailed Orphanet about the mapping between Orphanet:95494 Combined pituitary hormone deficiencies, genetic forms and OMIM:613038 PITUITARY HORMONE DEFICIENCY, COMBINED OR ISOLATED, 1; CPHD1.

In OLS, it says these terms are an exact mapping, but it seems that different genes are implicated in each disease.

nicolevasilevsky commented 2 years ago

Response from Orphanet:

I have reviewed the mapping between Orphanet:95494 Combined pituitary hormone deficiencies, genetic forms and OMIM:613038 PITUITARY HORMONE DEFICIENCY, COMBINED OR ISOLATED, 1; CPHD1 and you are right this is not an Exact mapping but rather BTNT (ORPHA code's Broader Term maps to a Narrower Term). I will therefore proceed with correcting this error. Thank you very much for your insight!

joeflack4 commented 2 years ago

Oh, nice catch! Let me know (here or at a meeting) if you think that there's something you think I need to do on the ingest side of things to handle this exception case.

nicolevasilevsky commented 2 years ago

@joeflack4 I added this to the QC call agenda. I have some questions about this that would probably be easiest to discuss on the call. Thanks!

kanems commented 1 year ago

@nicolevasilevsky - I have a question about the mapping of Orphanet:93921 to MONDO:0024517. The Orpha data points to multiple exact matches: UMLS:C1335929 (E) = "Schwannomatosis" (maps SNOMED CT 781641005, same string; parent to "Schwannomatosis 1" in UMLS hierarchy) MeSH:C536641 (E) = "Schwannomatosis" (it does have a syn. of the type 1 sub-type, but UMLS splits this code across multiple CUIs, the primary record is on UMLS: C1335929 ) UMLS:C2931480 (E)= "Neurofibromatosis, Type 3, mixed central and peripheral" UMLS:C0917817 (E)= "Neurofibromatosis 3" OMIM:162091 (E) = "SCHWANNOMATOSIS 1"

And the Orphanet data has BTNT for two additional OMIM records: OMIM:162260, OMIM:615670 Overall, as I read the data and based on string-matching, the OrphaID looks like a more appropriate match to MONDO:0008075 "neurofibromatosis type 3" (it's even a synonym on the OrphaID in question)

kanems commented 1 year ago

Similar question about the mapping of Orphanet:99000 to OMIM:608161 & MONDO:0024561. The Orpha record maps to multiple causative genes, but the MIM record is ONLY mapping to peripherin 2. I think the Orpha record should map to a broader concept, possibly: MONDO:0011979

nicolevasilevsky commented 1 year ago

nicolevasilevsky - I have a question about the mapping of Orphanet:93921 to MONDO:0024517. The Orpha data points to multiple exact matches: UMLS:C1335929 (E) = "Schwannomatosis" (maps SNOMED CT 781641005, same string; parent to "Schwannomatosis 1" in UMLS hierarchy) MeSH:C536641 (E) = "Schwannomatosis" (it does have a syn. of the type 1 sub-type, but UMLS splits this code across multiple CUIs, the primary record is on UMLS: C1335929 ) UMLS:C2931480 (E)= "Neurofibromatosis, Type 3, mixed central and peripheral" UMLS:C0917817 (E)= "Neurofibromatosis 3" OMIM:162091 (E) = "SCHWANNOMATOSIS 1"

And the Orphanet data has BTNT for two additional OMIM records: OMIM:162260, OMIM:615670 Overall, as I read the data and based on string-matching, the OrphaID looks like a more appropriate match to MONDO:0008075 "neurofibromatosis type 3" (it's even a synonym on the OrphaID in question)

nicolevasilevsky commented 1 year ago

99000

Similar question about the mapping of Orphanet:99000 to OMIM:608161 & MONDO:0024561. The Orpha record maps to multiple causative genes, but the MIM record is ONLY mapping to peripherin 2. I think the Orpha record should map to a broader concept, possibly: MONDO:0011979

I agree this mapping is not correct. However, I am unsure if Orphanet:99000 is an exact match with the OMIMPS (MONDO:0011979) because Orphanet includes 4 genes and OMIM includes 5. I'll bring this up on the Mondo curation call

matentzn commented 1 year ago

I think this ticket need some strong sheperding. I will reassign it to you @nicolevasilevsky - I don't think it is the most important to deal with but if we can make slow amounts of progress of adding these mappings into Mondo, that would be great!

nicolevasilevsky commented 1 year ago

I have been working on this and will continue to do so, slowly, slowly :)