Closed KwatMDPhD closed 5 years ago
This is just a corrupted BGZF file. Are you sure that bgzip actually over-wrote the existing file named Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
? As far as forking the repo, I'm not sure what you're wanting to do, but BGZF support does in fact work and has a whole test suite (https://github.com/mdshw5/pyfaidx/blob/master/tests/test_Fasta_bgzip.py) that's passing, and raises errors for class methods that are not supported when using BGZF. I'll try to reproduce the error you're seeing, and if so go from there.
I just tested again and you're right that BGZF works.
Thanks for the message. Perhaps the warnings for BGZF can be dropped?
Yeah I pulled your changes and version 5.0.1 does not raise this warning.
I just figured out the source of my error. When I do samtools faidx file.fa.gz
and then try to use the same file.fa.gz
file for pyfaidx
, then I get the bug I described.
I just figured out the source of my error. When I do
samtools faidx file.fa.gz
and then try to use the samefile.fa.gz
file forpyfaidx
, then I get the bug I described.
FYI I had the same issue, and to fix it I did:
touch Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
faidx Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz 11:1-1 # create new index and return garbage coordinate
I stopped using pyfaidx
. I now call command line samtools
and parse its output, a more general approach.
pysam has FastaFile which has a fetch method which seems to do the trick, and avoids the process calling overhead of calling the command line tool.
I came here from PyFasta which required me to duplicate references and now says to use pyfaidx.
I see. Also, if you do change your programming language, you have to find a new PyFasta
or any library replacement. I had to go through this bitterness and ended up using the purest command line samtools
and parsing its result.
I think you accidentally reopened this issue. It's fixed
I downloaded ftp://ftp.ensembl.org/pub/release-89/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz.
I converted gzip to bgzip:
Then I used pyfaidx:
Finally, I got an error:
I can fork the repo and disable the bgzf support. What do you think?