openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
372 stars 65 forks source link

Stable Ensembl ID #173

Open erichu8 opened 7 years ago

erichu8 commented 7 years ago

Hi, I'd like to use your library but I'm getting endless errors for retired ENSG ids. How should I proceed when I get errors like this one?

ensembl.gene_name_of_gene_id("ENSG00000129277")

""" ValueError: No results found for query:

        SELECT distinct gene_name
        FROM gene
        WHERE gene_id = ?

with parameters: ['ENSG00000129277'] """

Thanks!

EH

iskandr commented 7 years ago

Hi @erichu8,

What would you like pyensembl to do with retired IDs? If the gene doesn't exist in the database then I'm not sure what should be returned.

iskandr commented 7 years ago

It seems that ENSG00000129277 hasn't been in Ensembl since the switch from GRCh37 to GRCh38, maybe you should be using EnsemblRelease(75)?

erichu8 commented 7 years ago

Hi, Alex, Thanks for responding so fast. Ideally I'd like my entire group to start using pyensembl; for that to happen retired ID handling should be easy like it is in ensembl perl (sic!) api. For this specific test data you are right, I'm better off with GRCh37; do you think retired ID handling is something that could be pulled easily into pyensembl? The equivalent we use (all the time) is ID history converter to basically get from a gene specified in ANY ensembl version, and convert it into the latest assembly and re-run the pipe-line.

Thanks, EH

erichu8 commented 7 years ago

@iskandr

This is what the web utility returns for that specific gene, for instance; perl does a very similar thing

http://grch37.ensembl.org/Homo_sapiens/Tools/IDMapper/Results?db=core;tl=IdEf5zVttW5RnfCj-2134514

Thanks! EH

dhimmel commented 4 years ago

The equivalent we use (all the time) is ID history converter to basically get from a gene specified in ANY ensembl version, and convert it into the latest assembly and re-run the pipe-line.

I'm looking into using pyensembl. Being able to update a set of ensembl IDs to their version at the specified release would be a really handy feature. Does pyensembl have this ability?

@erichu8 can that online IDMapper tool (documented at http://grch37.ensembl.org/Help/View?id=560) be used via an API, or does it require some manual steps?