monarch-initiative / monarch-gene-mapping

Code for mapping source namespaces to preffered namespacing
2 stars 0 forks source link

Add UniprotKB to preferred gene mappings #3

Closed kevinschaper closed 1 year ago

kevinschaper commented 2 years ago

We have 500k dangling edges that use UniprotKB prefixes from the go annotation edges file that need to be mapped to the correct gene id for the species. This might bring the big mapping file back, if there are no other source.

The "big" Uniprot mapping file is at https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/. Some (but not all) of the species of interest are in taxon-specific files at https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/by_organism/. Alas, this subdirectory doesn't include the following species:

Resolving these mappings could leverage NCBI gene mappings.

Other issue tickets are defined elsewhere in this repository for non-NCBI model organism gene identifier mappings to UniProtKB.

kevinschaper commented 2 years ago

I had completely forgotten, but we actually already UniprotKB mappings for HGNC, and I'm realizing that strategically, that's what we want: we'll map a UniprotKB ID to a gene if HGNC says so.

I've also just realized that we should get a ton of UniProtKB mappings from the BGI files, and for some reason we were choosing to only get NCBI mappings before.

putmantime commented 2 years ago

Expand this mapping to include UniProt mappings to all preferred namespaces

RichardBruskiewich commented 1 year ago

Resolved by https://github.com/monarch-initiative/monarch-gene-mapping/pull/27