lmdu / pyfastx

a python package for fast random access to sequences from plain and gzipped FASTA/Q files
https://pyfastx.readthedocs.io
MIT License
257 stars 20 forks source link

SystemError: <class 'Fasta'> returned a result with an error set; on loading gzipped fasta #63

Open jolo2486 opened 1 year ago

jolo2486 commented 1 year ago

When loading e.g. the below gzipped fasta file: hgdownload.cse.ucsc.edu/goldenPath/dp3/bigZips/dp3.fa.gz

I get:

fasta = pyfastx.Fasta('./data/genomes/dp3.fa.gz')
RuntimeError: get seq count and length error

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  Cell In[61], line 1
    fasta = pyfastx.Fasta('./data/genomes/dp3.fa.gz')

SystemError: <class 'Fasta'> returned a result with an error set

I am in a conda environment, and installed pyfastx 0.9.1 using pip.

lmdu commented 1 year ago

First, delete the previous generated index file dp3.fa.gz.fxi, and then use pyfastx.Fasta to reindex it. If this does not work, please let me known.

maximilianmordig commented 1 year ago

Still fails.

The issue persists, even after deleting the index file (fxi file) curl https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0.fa.gz -O. It seems to be related to the index though:

import pyfastx; pyfastx.Fasta("chm13v2.0.fa.gz", build_index=False)

works, but

import pyfastx; pyfastx.Fasta("chm13v2.0.fa.gz")

does not.

I am using pyfastx version 1.1.0