refresh-bio / KMC

Fast and frugal disk based k-mer counter
277 stars 72 forks source link

Seems that only one-line FASTA files are supported #239

Closed shenwei356 closed 3 months ago

shenwei356 commented 3 months ago

Dear authors,

I tried to count k-mers with kmc v3.2.4 for a few files, but an error came out:

$  kmc -k31 -m10 -fa -ci1 @t.txt t.res t.tmp
Error: some error while reading fasta file, please contact authors (kmc_core/fastq_reader.cpp: 703)

$ cat t.txt
../testall/GCA_000043285.1.fna.gz
../testall/GCA_000378865.1.fna.gz
../testall/GCA_000378885.1.fna.gz
../testall/GCA_000378905.1.fna.gz
../testall/GCA_000378025.1.fna.gz
../testall/GCA_000380225.1.fna.gz
../testall/GCA_000380705.1.fna.gz
../testall/GCA_000381565.1.fna.gz
../testall/GCA_000381585.1.fna.gz
../testall/GCA_000375685.1.fna.gz

After reading the code, I guess it's due to fasta format. So I tried turning them into single-line format.

cat t.txt | rush 'seqkit seq {} -w 0 -o t/{%}'
ls t/* > t2.txt

And it worked:

kmc -k31 -m10 -fa -ci1 @t2.txt t.res t.tmp

********************
Stage 1: 100%
Stage 2: 100%
1st stage: 0.7174s
2nd stage: 2.2163s
Total    : 2.9337s
Tmp size : 28MB

Did I miss some options? Checked and not clues. I also searched the issues, and nobody mentioned this.

Best, Wei

marekkokot commented 3 months ago

Hi, use -fm instead of -fa. Let me know if it helps.

shenwei356 commented 3 months ago

Ah, it works!!!

Sorry, I miss that!

-f<a/q/m/bam/kmc> - input in FASTA format (-fa), FASTQ format (-fq), 
                    multi FASTA (-fm) or BAM (-fbam) or KMC(-fkmc); default: FASTQ