Open matentzn opened 2 years ago
Kent finished it before he left and I'm not really that familiar with the details, but we can definitely revisit any decisions in there. It's an oddball right now because it's creating nodes, and I assume it should be an edge only ingest and we should update the edges in the mapping step to catch human genes. (though, I'm less sure about the NucliecAcidEntity nodes?)
I am in particular concerned about the source files being used.. @cmungall told me that there are multiple ways the OMIM g2ds can be obtained, like from MedGen, directly from morbidmap etc.. And no one knows what really is the best solution here. (We should try multiple and diff to see whats the difference? not sure)
This is not really a ticket I can take on effectively - I can advice, but its probably better to assign someone else
is the OMIM ingest done?
@matentzn - Yes, it is.
we should consolidate the various OMIM ingests we have across Monarch a bit, at least conceptually.
@matentzn - I agree, and I would like for us to retake this conversation if it hasn't already been done. Could we please restart this convo during the data call on 2024-02-01.
@cmungall @@kevinschaper 👀👆🏽
If I think about this correctly:
Related to https://github.com/monarch-initiative/monarch-app/issues/707 @madanucd
The current flow for ingesting G2D associations from OMIM follows a structured pathway: data originates from OMIM, passes through Medgen, proceeds to HPOA-G2D, and finally integrates into the Monarch Knowledge Graph (KG).
An assessment of G2D associations among these sources, as of March 2024, reveals a comprehensive coverage. The relationships are visually depicted in an UpSet plot or Venn diagram, highlighting that HPOA-G2D encapsulates all associations from Medgen. Additionally, Medgen ensures the inclusion of all associations from OMIM.
However, upon closer examination of Medgen's sources, a noteworthy observation emerges. While Medgen effectively captures G2D associations from OMIM, its downloadable files (as of March 2024) reveal a reliance on intermediary sources for these associations. This indirect pathway necessitates periodic verification to ensure that OMIM's contributions are fully accounted for within the dataset.
<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40">
@kevinschaper is the OMIM ingest done? How did we make the decision of using OMIM morbidmap vs medgen gene2disease?