openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
365 stars 66 forks source link

How to get protein annotation for a input genomic position #244

Closed damianosmel closed 3 years ago

damianosmel commented 3 years ago

Dear developers,

Thank you for creating this tool! I have used it already to automate some refinements of PVS1 evidence criterion, following Tayouen et al. 2018 work.

However, there are some steps that I need to know the annotation of the protein (product) for the input variant genomic position. In the docs there is no explanation for (an available) protein module.

So I would like to ask you, if there is a way on how to retrieve protein feature of a region that is the translation of an input genomic variant position? If there is such way may you please give a short of an example using pyensembl?

LuukHenk commented 3 years ago

I found a way to access protein sequences using the ensembl_id: data = EnsemblRelease() protein_ids = data.protein_ids() data.protein_sequence(protein_ids[0])

julia326 commented 3 years ago

Hi @damianosmel , sorry for the epic delay on this but glad to hear this tool is useful! We have a related tool for predicting the coding effect of a variant called Varcode, and you can use it like this:

from pyensembl import EnsemblRelease
from varcode import Variant

ensembl_grch38 = EnsemblRelease(95, species='human')
Variant(contig=4, start=1807189, ref='C', alt='G', ensembl=ensembl_grch38).effects()

Let me know if that doesn't answer your question though.

damianosmel commented 3 years ago

I found a way to access protein sequences using the ensembl_id: data = EnsemblRelease() protein_ids = data.protein_ids() data.protein_sequence(protein_ids[0])

Thank you @LuukHenk! I got similar logic by reading the pyensembl docs.

damianosmel commented 3 years ago

Hi @damianosmel , sorry for the epic delay on this but glad to hear this tool is useful! We have a related tool for predicting the coding effect of a variant called Varcode, and you can use it like this:

from pyensembl import EnsemblRelease
from varcode import Variant

ensembl_grch38 = EnsemblRelease(95, species='human')
Variant(contig=4, start=1807189, ref='C', alt='G', ensembl=ensembl_grch38).effects()

Let me know if that doesn't answer your question though.

Thank you for detailed answer @julia326. Varcode seems also a very helpful contribution, thanks to all the developers! To me, by a quick look on its repository, seems as an alternative to VeP or ANNOVAR so I will check out this functionality.

As I have made this issue more like a "how2", I will close this issue. Thank you!