lmdu / pyfastx

a python package for fast random access to sequences from plain and gzipped FASTA/Q files
https://pyfastx.readthedocs.io
MIT License
262 stars 23 forks source link

Unable to index gzipped fasta file by name: KeyError: 'chrY does not exist in fasta file' #62

Open jolo2486 opened 1 year ago

jolo2486 commented 1 year ago

I have loaded a few example gzipped fasta files, e.g. the following one: hgdownload.cse.ucsc.edu/goldenPath/dm6/bigZips/dm6.fa.gz

I can load it, iterate and so on, and:

fasta = pyfastx.Fasta(./data/genomes/dm6.fa.gz)
fasta[0]
Out[49]: <Sequence> chr2L with length of 23513712

but

fasta['chr2L']
KeyError: 'chr2L does not exist in fasta file'

Also:

keys = fasta.keys()
keys[0]
Out[57]: 'chr2L'

but

fasta[keys[0]]
KeyError: 'chr2L does not exist in fasta file'

I sincerely hope that I have not misunderstood anything, I went by what was listed in the docs:

>>> # get sequence like dictionary
>>> s1 = fa['JZ822577.1']
>>> s1
<Sequence> JZ822577.1 with length of 333

I am in a conda environment, and installed pyfastx 0.9.1 using pip.

lmdu commented 1 year ago

Try the latest version 1.0.0. We have fixed this issue.