refresh-bio / KMC

Fast and frugal disk based k-mer counter
266 stars 73 forks source link

Error: Wrong input file! #138

Closed amitj-i closed 2 years ago

amitj-i commented 5 years ago

Hi, now I'm using KMC 3.0 to counting my data, but it occurred this error. Stage 1: 100%Error: Wrong input file! My data is in the attachment. Is there something wrong? Thank you!

573.13611.zip

marekkokot commented 5 years ago

As I understand it works now since you have closed the issue?

zyj1729 commented 2 years ago

May I know how did you solve the issue? I have the same issue.

marekkokot commented 2 years ago

Hi,

could you send me your input file(s), the exact command line you are using and the exact kmc version?

zyj1729 commented 2 years ago

I fixed the issue by setting -fm instead of -fa. I was running KMC ver. 3.0.0 (2017-01-28) on some Pacbio Hifi reads fasta files. Using -fa, I got Error: Wrong input file! on one file and Stage 1: 75%Error: Wrong input file! on another. I saw people with similar issues using -fm to fix the issue, although the instruction says -fm is for multiple fastq not fasta.

But does -fm count fasta files as fastq format, something like skipping every other read in the fasta file? Should I worry about missing counts in the result? Thanks.

marekkokot commented 2 years ago

I would consider updating KMC to some newer version since some long reads related issues were fixed. There are high chances that also your issue was fixed. If you decide to update KMC and the issue is still there please let me know.

The instruction says -fm is for multi-fasta (fasta where sequences are possibly split into multiple lines, I'm not aware of the existence of multi-fastq format). Each fasta file is a special case of a multi-fasta file, thus KMC can count k-mers in fasta files using -fm switch and the results are correct (if, hoverer, you notice some errors in the results pleas let me know). One may wonder: why on Earth there are -fa and -fm if -fm would be sufficient. There are two reasons: historically KMC was not able to handle multi-fasta files and the support was added later and -fm may be slightly slower than -fa, so it is recommended to use -fa for fasta files although the results will be the same like with -fm. At some point, I advised using -fm when there were these strange bugs. I hope the updated KMC will just work fine with -fa, if not I would really appreciate it if you let me know since I would really like to get rid of these bugs.

marekkokot commented 2 years ago

I'm assuming updating to a newer version fixed the issue. If it still occurs feel free to reopen.