monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

HGNC robot template #559

Closed joeflack4 closed 1 week ago

joeflack4 commented 3 weeks ago

Addresses sub-tasks in:

Related:

Overview

Update mondo_genes.csv to be a proper ROBOT template, and ties into pipeline for externally managed content.

Pre-merge checklist

Documentation

Was the documentation added/updated under docs/?

QC

Was the full pipeline run before submitting this PR using sh run.sh make build-mondo-ingest on this branch (after docker pull obolibrary/odkfull:dev), and no errors occurred?

Build PR:

New Packages

Were any new Python packages added?

Were any other non-Python packages added?

PR Review and Conversations Resolved

Has the PR been sufficiently reviewed by at least 1 team member of the Mondo Technical team and all threads resolved?


CC: @souzadevinicius Thought this would be a good one for you to review

twhetzel commented 2 weeks ago

@matentzn In the file https://github.com/monarch-initiative/mondo-ingest/blob/hgnc-template/src/ontology/external/mondo_genes.robot.tsv for MONDO_0000208 I see the source is the OMIM record. This seems a better option that what is currently in Mondo as the source, e.g. MONDO:mim2gene_medgen.

However,

sabrinatoro commented 2 weeks ago

MONDO:mim2gene_medgen

I do not know what this MONDO:mim2gene_medgen source refers to. It looks like a way to say that we are getting this information from the omim-gene file that somehow involves medgen? @nicolevasilevsky do you remember anything about this?

I think it is ok to remove the MONDO:mim2gene_medgen sources and replace them with OMIM identifier. However, it would not hurt to keep something like MONDO:mim2gene as a source to indicate that this annotation was made via a specific pipeline (similar to the "MONDO:MEDGE" source on the UMLS x-ref- image below for illustration).

We would therefore have the source be:

Screenshot 2024-06-13 at 1 03 39 PM

twhetzel commented 2 weeks ago

MONDO:mim2gene_medgen is documented on the Entities page as "This indicates the gene relationship came from MedGen.". @joeflack4 can you remind me whether these mappings originally were from the MedGen mappings file? @sabrinatoro do you want a different annotation used for the source still given the definition and pending Joe's answer for the question above?

sabrinatoro commented 2 weeks ago

@sabrinatoro do you want a different annotation used for the source still given the definition and pending Joe's answer for the question above?

I feel like I don't have enough information to give a clear answer, but I will try. Where is the information coming from?

My GUESS is that we were using the gene annotation from medgen at one point and medgen got this gene to disease annotation from omim (ie the MONDO:mim2gene_medgen source). It makes sense that we would switch to getting this information directly from omim now.

, then there is no point in keeping it. I am assuming (again assuming, please someone confirm)

twhetzel commented 2 weeks ago

Looking at the Monarch omim repo, I see references to the OMIM API and this download OMIM page so I guess the data in this HGNC ROBOT template is only coming from OMIM. That's the first thing for @joeflack4 or @matentzn to confirm.

If the data in this HGNC ROBOT file is only from OMIM, then we can go with Sabrina's comment: if we get the information directly from OMIM, then we can use something like "MONDO:OMIM" (or whatever source we have to say that something comes from OMIM; I don't have a strong opinion about what to name it, but I can make a name up). (from Trish - MONDO:OMIM fits the pattern I see for GARD and NORD so +1 from me)

The last thing that I do not know is where did the data (see example below) that is currently in Mondo with has material basis in germline mutation in come from and more importantly do we need to do anything about it. For example, MONDO:0000208 and 'has material basis in germline mutation in' some TRMT10A with source MONDO:mim2gene_medgen. I don't know if this HGNC template data in this PR is in addition to or intended to replace the existing data and if both the data in this ROBOT template and the existing data are from the same source and therefore should have the same source annotation. @matentzn do you know the answer to this?