thegenemyers / FASTK

A fast K-mer counter for high-fidelity shotgun datasets
Other
116 stars 16 forks source link

Won't open a .fa file #1

Closed rsharris closed 3 years ago

rsharris commented 3 years ago

Synopsis: I think the suffix[] table at line 138 in io.c has a typo in the last item in the second row.

Details:

Looking to use this package to quickly identify half-covered scaffolds (or scaffold segments) in an assembly, having seen the two presentations in VGP conference calls.

Working with a mar/31 clone of the repo.

For first test I gave it a small random apple.fa file. When I tried 'FastK apple.fa', it reported "FastK: Cannot open apple.fa as a .cram|[bs]am|f{ast}[aq][.gz]|db|dam file."

That report comes from Fetch_File() in io.c. Examining that, well, it's not completely clear to me what the purpose of the suffix[] and extend[] tables are (maybe it forgives the user for giving an incomplete file extension?). But given that the loop that attempts the different extensions, in order, records the first success in i, and then assigns ftype as FASTQ or FASTA based on whether i is odd or even (and >=5), the intent appears to be that the table should alternate between fastq and fasta extensions. But at the end of the second line of suffix[] we see ".fq" twice. I infer the second of those should be ".fa", and when I make that change FastK is then able to open apple.fa without complaint.

thegenemyers commented 3 years ago

Right you are. Sorry I always use .fasta's and so hadn't tested .fa's.
Made the edit and committed so please download again and feel free to let me know if there are any other problems. Thanks, Gene

On 3/31/21, 11:35 PM, Bob Harris wrote:

Synopsis: I think the suffix[] table at line 138 in io.c has a typo in the last item in the second row.

Details:

Looking to use this package to quickly identify half-covered scaffolds (or scaffold segments) in an assembly, having seen the two presentations in VGP conference calls.

Working with a mar/31 clone of the repo.

For first test I gave it a small random apple.fa file. When I tried 'FastK apple.fa', it reported "FastK: Cannot open apple.fa as a .cram|[bs]am|f{ast}[aq][.gz]|db|dam file."

That report comes from Fetch_File() in io.c. Examining that, well, it's not completely clear to me what the purpose of the suffix[] and extend[] tables are (maybe it forgives the user for giving an incomplete file extension?). But given that the loop that attempts the different extensions, in order, records the first success in i, and then assigns ftype as FASTQ or FASTA based on whether i is odd or even (and >=5), the intent appears to be that the table should alternate between fastq and fasta extensions. But at the end of the second line of suffix[] we see ".fq" twice. I infer the second of those should be ".fa", and when I make that change FastK is then able to open apple.fa without complaint.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/thegenemyers/FASTK/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUSINTRVKFWGG3CZRDHER3TGOIRZANCNFSM42FTE72A.

rsharris commented 3 years ago

Yep, looked like a simple typo and the kind of thing that could escape undetected by a single user. Thanks.