monarch-initiative / monarch-app

Monarch Initiative website and API
https://monarchinitiative.org/
BSD 3-Clause "New" or "Revised" License
16 stars 3 forks source link

OMIM ingest: what is/should be the source of truth for gene 2 disease in OMIM #708

Open matentzn opened 2 years ago

matentzn commented 2 years ago

@kevinschaper is the OMIM ingest done? How did we make the decision of using OMIM morbidmap vs medgen gene2disease?

kevinschaper commented 2 years ago

Kent finished it before he left and I'm not really that familiar with the details, but we can definitely revisit any decisions in there. It's an oddball right now because it's creating nodes, and I assume it should be an edge only ingest and we should update the edges in the mapping step to catch human genes. (though, I'm less sure about the NucliecAcidEntity nodes?)

matentzn commented 2 years ago

Ok, maybe it is not a priority right this moment, but we should consolidate the various OMIM ingests we have across Monarch a bit, at least conceptually:

matentzn commented 2 years ago

I am in particular concerned about the source files being used.. @cmungall told me that there are multiple ways the OMIM g2ds can be obtained, like from MedGen, directly from morbidmap etc.. And no one knows what really is the best solution here. (We should try multiple and diff to see whats the difference? not sure)

matentzn commented 1 year ago

This is not really a ticket I can take on effectively - I can advice, but its probably better to assign someone else

monicacecilia commented 5 months ago

is the OMIM ingest done?

@matentzn - Yes, it is.


we should consolidate the various OMIM ingests we have across Monarch a bit, at least conceptually.

@matentzn - I agree, and I would like for us to retake this conversation if it hasn't already been done. Could we please restart this convo during the data call on 2024-02-01.


@cmungall @@kevinschaper 👀👆🏽

matentzn commented 5 months ago

If I think about this correctly:

  1. g2ds are now, since our push over the summer, coming from the HPOA pipeline. These includes the OMIM g2ds (@kevinschaper right)?
  2. OMIM ids, with occasional links to genes as necessary for "defining the disease" come from https://github.com/monarch-initiative/omim, which I think also is ok
  3. We are exploring, or rather, should be, exploring, moving our g2d ingest uniformly to gencc (https://thegencc.org/)
sagehrke commented 5 months ago

Related to https://github.com/monarch-initiative/monarch-app/issues/707 @madanucd

madanucd commented 3 months ago

The current flow for ingesting G2D associations from OMIM follows a structured pathway: data originates from OMIM, passes through Medgen, proceeds to HPOA-G2D, and finally integrates into the Monarch Knowledge Graph (KG).

An assessment of G2D associations among these sources, as of March 2024, reveals a comprehensive coverage. The relationships are visually depicted in an UpSet plot or Venn diagram, highlighting that HPOA-G2D encapsulates all associations from Medgen. Additionally, Medgen ensures the inclusion of all associations from OMIM.

image image

However, upon closer examination of Medgen's sources, a noteworthy observation emerges. While Medgen effectively captures G2D associations from OMIM, its downloadable files (as of March 2024) reveal a reliance on intermediary sources for these associations. This indirect pathway necessitates periodic verification to ensure that OMIM's contributions are fully accounted for within the dataset.

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40">

Medgen Sources | G2D edges -- | -- GeneMap | 6763 GeneMap; GeneReviews | 379 NCBI curation | 8 GeneMap; NCBI curation | 8 GeneReviews | 182 GeneReviews; NCBI curation | 7 GeneMap; NCBI curation; OMIM | 1 GeneMap; OMIM | 2 GeneTests | 4 OMIM | 2 GeneMap; GeneTests | 2

sagehrke commented 3 months ago

@julesjacobsen @cmungall FYI 👀 ⬆️

matentzn commented 3 months ago

Very nice analysis..