mdshw5 / pyfaidx

Efficient pythonic random access to fasta subsequences
https://pypi.python.org/pypi/pyfaidx
Other
459 stars 75 forks source link

Unhandled exception when opening header only file #159

Closed openpaul closed 4 years ago

openpaul commented 4 years ago

I just came across an unhandled exception in a very edge use case. If you try reading a fasta file of only headers with no sequences Fasta will fail:

>seq_1

>seq_2
from pyfaidx import Fasta
Fasta("test.fa")

results in

RuntimeError: Unhandled exception during fasta indexing at entry seq_2Please report this issue at https://github.com/mdshw5/pyfaidx/issues [(3, 1), (4, 1)]

I think this is fine, but as the error told me to report this, I did.

openpaul commented 4 years ago

If I try catching the runtime error I get a more helpfull error message:

-------------------------------------------------
ZeroDivisionErrorTraceback (most recent call last)
<ipython-input-10-3100b03c0ce4> in <module>
      1 try:
----> 2     Fasta("a.fa")
      3 except RuntimeError:
      4     print("no")

~/.local/lib/python3.7/site-packages/pyfaidx/__init__.py in __init__(self, filename, default_seq, key_function, as_raw, strict_bounds, read_ahead, mutable, split_char, filt_function, one_based_attributes, read_long_names, duplicate_action, sequence_always_upper, rebuild, build_index)
    994             sequence_always_upper=sequence_always_upper,
    995             rebuild=rebuild,
--> 996             build_index=build_index)
    997         self.keys = self.faidx.index.keys
    998         if not self.mutable:

~/.local/lib/python3.7/site-packages/pyfaidx/__init__.py in __init__(self, filename, default_seq, key_function, as_raw, strict_bounds, read_ahead, mutable, split_char, duplicate_action, filt_function, one_based_attributes, read_long_names, sequence_always_upper, rebuild, build_index)
    412                 if os.path.exists(self.indexname) and getmtime(
    413                         self.indexname) >= getmtime(self.filename):
--> 414                     self.read_fai()
    415                 elif os.path.exists(self.indexname) and getmtime(
    416                         self.indexname) < getmtime(

~/.local/lib/python3.7/site-packages/pyfaidx/__init__.py in read_fai(self)
    462                     rlen, offset, lenc, lenb = map(int,
    463                                                    (rlen, offset, lenc, lenb))
--> 464                     newlines = int(ceil(rlen / lenc) * (lenb - lenc))
    465                     bend = offset + newlines + rlen
    466                     rec = IndexRecord(rlen, offset, lenc, lenb, bend,

ZeroDivisionError: division by zero

This narrows it down to the function read_fai.

mdshw5 commented 4 years ago

Thanks for reporting this. I stuck that error message in there since I wasn't sure if I was missing any edge cases. Congratulations on being the 🥇 to find one! I think that, while this is a weird file, there should be nothing wrong with a file full of empty sequences, and so I'll plan on handling this. Could you attached the file to this issue so I can look at the exact contents?

openpaul commented 4 years ago

Yeah, I think handeling it would be the sane thing to do. Thank you. Here are the faa and fai files:

emptyfaa.zip

mdshw5 commented 4 years ago

Thanks for the file @openpaul, but I have to apologize for not realizing this issue was fixed in #155, which has been released in all versions > 0.5.6 (https://github.com/mdshw5/pyfaidx/releases/tag/v0.5.6). Can you confirm that updating your version fixes this issue for you?

mdshw5 commented 4 years ago

I can see that this issue is fixed in the current release, so will mark this as closed unless otherwise noted:

$ python3 -m virtualenv venv
$ source venv/bin/activate
$ pip install pyfaidx
$ ls
empty_prot.faa      empty_prot.faa.fai  venv
$ rm empty_prot.faa.fai 
$ faidx empty_prot.faa 
>seq_1
>seq_2
openpaul commented 4 years ago

Reopened it as a new issue, as its not really related