mdshw5 / pyfaidx

Efficient pythonic random access to fasta subsequences
https://pypi.python.org/pypi/pyfaidx
Other
459 stars 75 forks source link

Header longer than actual fasta header #173

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hello, I am facing this situation, having a list of fasta headers such as

Typhlosyrinx_TRINITY_DN28042_c4_g4_i1.p1

and in the fasta file the header is

>Typhlosyrinx_TRINITY_DN28042_c4_g4_i1 gener3

As you see, the actual header doesn't have the ".p1", however it does have extra words.

faidx reftrans.fa list
warning:  list not found in file

Is there a workaround with pyfaidx? I know I could just remove the ".p1" the ID list but as I am not sure that all the sequences and ID are consistently formatted, I am worried that removing the ".p1" might have unwanted side effects.

Thanks for your insight

mdshw5 commented 3 years ago

Hey @Axze-rgb. Sorry for the delay. I'm not exactly sure what's going on here but if you can send the FASTA file, or just the problematic portion, I will try to reproduce it.

mdshw5 commented 3 years ago

@Axze-rgb Please do let me know if you can send me the problematic file so I can reproduce this issue. For now I'll close this.