openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
365 stars 66 forks source link

[Feature Request] Accessing sequences in arbitrary ranges via Genome class #236

Open gokceneraslan opened 4 years ago

gokceneraslan commented 4 years ago

It would be great to have a generic .get_sequence(self, contig, position, end, strand) function in the Genome class which extracts sequence from the genome given a range. This is similar to getSeq function in BSgenome package in R:

getSeq(Celegans, 'chrI',  start=100, end=200, strand='+')

.transcript_sequence() and .protein_sequence() can also use this function.

Right now, it's not possible to get the full sequence of a gene including UTRs and introns. That'd make it easier for example.