Closed Benjamin-Lee closed 6 years ago
Well, it's not a FASTA file without the description line. Are we talking about a file that starts with a semicolon (like this example)? In that case I could see adding support for FASTA comments.
If we're talking about file that just contain sequence and no comments or identifiers I doubt there's an indexing strategy for these, since a multi-FASTA file would have no record separator for multiple entries.
If you can provide a bit more detail about how you'd like this supported we can go from there. Thanks!
Ideally, it would parse it as normal. That being said, I understand if you don't think that supporting non properly formatted FASTA files is within the scope or even advisable for this project (we recently realized that Biopython doesn't support it either and fails silently). If so, could we maybe add a specific warning if no >
is found rather than a generic error?
Definitely adding better exceptions would be great. Can I have an example of the file format in question?
Sure! The exact file in question can be viewed here.
Basically instead of:
> description
ATGGACAGTA...
GATAGATACC...
it was getting passed:
ATGGACAGTA...
GATAGATACC...
I've added a case for handling files with no valid description lines and pushed a new release (https://github.com/mdshw5/pyfaidx/releases/tag/v0.5.5.1) that should be on PyPI in a few minutes.
Thanks a ton!
Ok this is a weird one: I'm at a hackathon and they handed us "mystery genomes" that were FASTA files with the comment line removed. I tried to use pyfaidx (through squiggle) and got this error: