monarch-initiative / mondo

Mondo Disease Ontology
http://obofoundry.org/ontology/mondo
Creative Commons Attribution 4.0 International
236 stars 54 forks source link

Create synchronization pipeline for disease-gene in Mondo #7229

Open sabrinatoro opened 9 months ago

sabrinatoro commented 9 months ago
matentzn commented 9 months ago

This PR https://github.com/monarch-initiative/omim/pull/107

Will add a new release artefact to omim ingest which contains all the MONDO->HGNC gene associations via

MONDO:Disease-exactMatch->OMIM:Disease--['has basis in germline mutation of']-->OMIM:Gene-->HGNC:Gene.

@twhetzel should maybe spend some time reviewing my choice of only including "evidence code 3" from morbidmap (I dont know exactly what that means, ask @joeflack4, but the evidence string is:

Evidence: (3) The molecular basis for the disorder is known; a mutation has been found in the gene." 

To review cases like this @twhetzel and I are deploying omim.owl from the Mondo ingest on Monarch OLS. This way we can see a bit better what is going on.

Next steps:

matentzn commented 9 months ago

BTW, we deployed the Mondo version of OMIM now here: https://ols.monarchinitiative.org/ontologies/omim/terms?iri=https%3A%2F%2Fomim.org%2Fentry%2F100100

joeflack4 commented 9 months ago

@matentzn @twhetzel I don't know why Nico only included "evidence code 3", and I can't think of anything else I might know other than what comes from the comments section in morbidmap.txt provided by OMIM:

1 - The disorder is placed on the map based on its association with a gene, but the underlying defect is not known. 2 - The disorder has been placed on the map by linkage or other statistical method; no mutation has been found. 3 - The molecular basis for the disorder is known; a mutation has been found in the gene. 4 - A contiguous gene deletion or duplication syndrome, multiple genes are deleted or duplicated causing the phenotype.

matentzn commented 9 months ago

It seemed to me that only case 3 fulfills @sabrinatoro conditions above (definition of this ticket). Maybe I am wrong.

joeflack4 commented 9 months ago

@matentzn Ah OK. I should've read the full ticket. Hmm, yes, I think only (3) meets all of @sabrinatoro's requirements.

twhetzel commented 9 months ago

3 - The molecular basis for the disorder is known; a mutation has been found in the gene. This one seems fine, although I am not sure if it guarantees there is a 1-1 relation between omim and gene.

Evidence codes for 1 and 2 may be relevant, but would need expert input on that. Agree that 4 is not relevant.

@matentzn where is it saying "disease-to-gene"? I saw https://github.com/monarch-initiative/omim/pull/107 and flipped mappings, but not sure if there is a file with that to look at. It does sound a bit odd, but I can see some arguments for doing it that way based on some of the existing RO relations.

joeflack4 commented 9 months ago

@twhetzel You may find it useful to glance at this. When you sign up for OMIM data downloads, this is one of the main files (mim2gene.txt). There is a "MIM Entry Type", and I think the ones we're interested should be "phenotype" and maybe "predominantly phenotypes" (maybe there's more). "Phenotype" being sometimes used interchangeably with "disease", especially in the OMIM (and OMIA, I assume) context.

mim2gene.tsv.zip (FYI its an old copy)

twhetzel commented 8 months ago

@matentzn do you need more information from anyone for this ticket?

matentzn commented 8 months ago

Next step is: Curator review of

https://github.com/monarch-initiative/omim/releases/download/2024-03-24/mondo_genes.csv

I personally do not know exactly how to review this, but @sabrinatoro may be able to help. I would stick this in Google docs, then start looking at a few examples and taking notes.

joeflack4 commented 5 months ago

Related issues: