refresh-bio / KMC

Fast and frugal disk based k-mer counter
252 stars 73 forks source link

Dump with empty output #234

Open MagpiePKU opened 1 month ago

MagpiePKU commented 1 month ago

Hi,

We try to list all possible kmers from a pair of fastq files using:

kmc \
 -k31 -m128 -ci1 -cs4294967295  -cx1e13 -t36 \
-jOutput.json \
 @files.lst \
 test.kmer.output \
 `pwd`/kmc_tmp_dir \

The output is:

{
    "1st_stage": "17.3042s",
    "2nd_stage": "11.8641s",
    "Total": "29.1683s",
    "Tmp_size": "2529MB",
    "Stats": {
        "#k-mers_below_min_threshold": 0,
        "#k-mers_above_max_threshold": 55569656,
        "#Unique_k-mers": 203791207,
        "#Unique_counted_k-mers": 148221551,
        "#Total no. of k-mers": 2290090695,
        "#Total_reads": 41894946,
        "#Total_super-k-mers": 220071788
    }
}

The kmc produced database were not empty: they were like

-rw-r--r-- 1 wing eulerbioinfo       9437 May 27 05:27 test.kmer.output.kff
-rw-r--r-- 1 wing eulerbioinfo   68157532 May 27 05:34 test.kmer.output.kmc_pre
-rw-r--r-- 1 wing eulerbioinfo 1037550865 May 27 05:34 test.kmer.output.kmc_suf

However when we run kmc_tools -t36 -v transform test.kmer.output dump test.kmer.txt, it resulted in empty output.

Thanks a lot.

marekkokot commented 1 month ago

Hmm, there is something strage here indeed. It seems the output should not be empty - could you share your input files, becasue I cannot reproduce this. What version are you using?

There is one think that I think will solve your issue. -cx doest not accept scientific notation, and you -cx becomes 1 instead of intended 1e13. In most cases the default cx (1e9) should be just fine, but if you want to increase it please plain number instead of scientific notation.

Let me know if it help.

MagpiePKU commented 1 month ago

Thanks a lot. It is a huge fastq file directly from sequencer so I will see if I can reproduce it with fewer reads then get back. I used the newest release version (3.2.4) binaries.

Hmm, there is something strage here indeed. It seems the output should not be empty - could you share your input files, becasue I cannot reproduce this. What version are you using?

There is one think that I think will solve your issue. -cx doest not accept scientific notation, and you -cx becomes 1 instead of intended 1e13. In most cases the default cx (1e9) should be just fine, but if you want to increase it please plain number instead of scientific notation.

Let me know if it help.