monarch-initiative / omim

Data ingest pipeline for OMIM.
6 stars 2 forks source link

Refactor OMIM gene handling #107

Closed matentzn closed 4 months ago

matentzn commented 5 months ago

The PR refactors the way the OMIM ingest represents genes. This means, in particular that

It furthermore:

  1. Standardises the OMIMPS prefix to what it usually is in Mondo
  2. Adds a few more convenience mappings (mondo and hgnc) to the output (I want to load the file into Monarch OLS for transparency, so Mondo curators can check how the original OMIM representation in the Monarch Ingest looks like, for debugging.
  3. Removes some redundant lines of ETL script.

The main reason for this PR is to work towards: https://github.com/monarch-initiative/mondo/issues/7229, which requires the existence of d2g alongside the currently existing g2d.

matentzn commented 4 months ago

Do you have a high level summary for this PR? Like something we're pushing for right now? Or is it a bundle of different gene related stuff we've been meaning to do for a while?

What in addition to the PR description would you like to know? The main issue was to deal with https://github.com/monarch-initiative/mondo/issues/7229, but then when I saw the code I saw a few opportunities to update the mondo ingest on other matters :P

joeflack4 commented 4 months ago

@matentzn RE: discussion on:

Do you have a high level summary for this PR?

I should truly just only ever make issue "threads". There's no way to tell these comments are related and mark them resolved.

Your OP summary was helpful but the title of https://github.com/monarch-initiative/mondo/issues/7229 "Create synchronization pipeline for disease-gene in Mondo #7229" and Sabrina's bulleted description was much clearer for me for a high-level.

matentzn commented 4 months ago

Your OP summary was helpful but the title of https://github.com/monarch-initiative/mondo/issues/7229 "Create synchronization pipeline for disease-gene in Mondo #7229" and Sabrina's bulleted description was much clearer for me for a high-level.

Updated the comment! Thanks