I have written some code which will allow me to find all the protein coding transcript variants of a gene and find which exons are common to all of them. Is there a way to retrieve sequences of exon by ID?
Alternatively, is there a way to retrieve the sequence using the start / end loci?
The ultimate goal of this is to annotate a genbank file with the common exons for downstream analysis, the genbank file seq annotations start from position 0 which is the start of the gene, so the start location of the exon won't work directly on my genbank file, hence getting the sequence so I can manually find and annotate. Open to alternate suggestions on how to go about this!
Hi,
I have written some code which will allow me to find all the protein coding transcript variants of a gene and find which exons are common to all of them. Is there a way to retrieve sequences of exon by ID?
This is my scripts output:
Exon(exon_id='ENSE00003673942', gene_id='ENSG00000144848', gene_name='ATG3', contig='3', start=112544057, end=112544106, strand='-')
Alternatively, is there a way to retrieve the sequence using the start / end loci?
The ultimate goal of this is to annotate a genbank file with the common exons for downstream analysis, the genbank file seq annotations start from position 0 which is the start of the gene, so the start location of the exon won't work directly on my genbank file, hence getting the sequence so I can manually find and annotate. Open to alternate suggestions on how to go about this!
Thanks in advance!