openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
374 stars 65 forks source link

Retrieve exon sequences? #246

Open pavzz94 opened 3 years ago

pavzz94 commented 3 years ago

Hi,

I have written some code which will allow me to find all the protein coding transcript variants of a gene and find which exons are common to all of them. Is there a way to retrieve sequences of exon by ID?

This is my scripts output:

Exon(exon_id='ENSE00003673942', gene_id='ENSG00000144848', gene_name='ATG3', contig='3', start=112544057, end=112544106, strand='-')

Alternatively, is there a way to retrieve the sequence using the start / end loci?

The ultimate goal of this is to annotate a genbank file with the common exons for downstream analysis, the genbank file seq annotations start from position 0 which is the start of the gene, so the start location of the exon won't work directly on my genbank file, hence getting the sequence so I can manually find and annotate. Open to alternate suggestions on how to go about this!

Thanks in advance!