Closed fungs closed 5 years ago
Yeah, this is expected. The 1.73 release will have a fix for #125, but there is a workaround. I was waiting on the 1.73 release, but being a large project @peterjc needs quite a bit of time and testing between releases. You can use the current development biopython version, installable:
pip install -e http+git://github.com/biopython/biopython.git#egg=biopython
You could also use an older pyfaidx version (https://github.com/mdshw5/pyfaidx/releases/tag/v0.5.4.2), but you may run into a RecursionError
using the current biopython BGZF code.
The preferred solution is to wait (a few days/weeks) for the biopython 1.73 release, which seems imminent.
Imminent yes, but unlikely to be this week as I'm off site for meetings until Friday.
It's ok for me, I will give it a try with the dev version of BioPython. The error message is confusing though, because I was trying to update BioPython through Conda and pip and wondered why I was always getting an old version :)
A related question since you seem to be the right guys to ask: FASTA+FAI support seems to be widespread but is there other software implementing the BGZIP version for random access of compressed FASTA? I was hoping for compatible implementations in, for instance, Seqan, but found nothing.
The only python API to BGZF compressed and indexed FASTA files that I know of is pysam. Since it wraps the reference BGZF code in htslib, you can fetch regions using its FastaFile
object. It doesn't appear that indexing is exposed through pysam, so you must use samtools to generate an index first. I'll also point out that pyfaidx does not generate or utilize the .gzi file, which htslib (and pysam) require, and I would like to incorporate this information in the future for more efficient (I think?) queries. If anyone wants to help, see #126.
Biopython supports random access to BGZF compressed sequence files including FASTA (for pulling out entire sequence records), but does not currently support the FAI index format. See Bio.SeqIO.index
(in memory index) and Bio.SeqIO.index_db
(reusable SQLite3 based index on disk). See also https://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html
I’m closing this issue as biopython 1.73 has been released.
I'm trying to access a bgzipped FASTA file. Although the message suggests that there is a BioPython version 1.73, this release does not exist by know. Is this the expected behavior?