openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
374 stars 65 forks source link

Nucleotide and protein sequences much larger with Python 2 #147

Closed iskandr closed 8 years ago

iskandr commented 8 years ago

The new FASTA parser decodes nucleotide and protein sequences into unicode objects, which are inefficiently 16 or 32 bits per character in Python 2.7. Probably worth exploring the in-memory efficiency of Bio.Seq.