monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
56 stars 26 forks source link

Improve gene-protein ID mapping in GO ingest #758

Open kshefchek opened 5 years ago

kshefchek commented 5 years ago

We're missing the GO annotations for EPM2A, see https://monarchinitiative.org/gene/HGNC:3413 vs. http://amigo.geneontology.org/amigo/gene_product/UniProtKB:B3EWF7

cc @realmarcin

kshefchek commented 5 years ago

It looks like this is a uniprot ID to gene mapping issue. This protein is mapped to a HGNC identifier but not a ncbi gene ID. We rely on a file that only contains the entrez mappings rather than organism specific ones.

We need a better strategy for ID mapping in general for this ingest. For human data, does it make sense to use the API and get the HGNC ID?

cc @deepakunni3