refresh-bio / KMC

Fast and frugal disk based k-mer counter
277 stars 72 forks source link

Empty output #223

Open mesti90 opened 1 year ago

mesti90 commented 1 year ago

I'd like to understand why kmc gives an empty output for several genomes.

The genome I downloaded and tried to analyze was https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/067/135/GCF_001067135.1_ASM106713v1/GCF_001067135.1_ASM106713v1_genomic.fna.gz (saved as GCF_001067135.1.fna.gz)

The following commands were called:

kmc -t60 -k75 -f -fm NCBI_genomes/GCF_001067135.1.fna Klebsiella_kmc/GCF_001067135.1 .

Stdout:

Stage 1: 100%
Stage 2: 100%
1st stage: 0.316763s
2nd stage: 0.140104s
Total    : 0.456867s
Tmp size : 4MB

Stats:
   No. of k-mers below min. threshold :      5403715
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :      5403715
   No. of unique counted k-mers       :            0
   Total no. of k-mers                :      5403715
   Total no. of sequences             :          240
   Total no. of super-k-mers          :       149446

Output: two files, GCF_001067135.1.kmc_pre (1.3Mb) and GCF_001067135.1.kmc_suf (8bytes)

kmc_dump GCF_001067135.1 GCF_001067135.1.kmers

The output is empty. What is the reason? What should I do to have a meaningful output?

marekkokot commented 1 year ago

Try to add -ci1 to your command. -f is probably unnecessary. -cx<x> causes removing all k-mers with counters below x (No. of k-mers below min. threshold : 5403715), the default is 2 (to remove k-mers that are probably a result of sequencing errors, which is not the case for genomic files). Let me know if it works with -ci1

mesti90 commented 1 year ago

@marekkokot Thank you, it works now