tlemane / kmtricks

modular k-mer count matrix and Bloom filter construction for large read collections
GNU Affero General Public License v3.0
72 stars 7 forks source link

"--kmer-size" not work #8

Closed liaoherui closed 3 years ago

liaoherui commented 3 years ago

Hi, Thanks for your amazing tool! Really helpful! However, I got a problem when I run the command below kmtricks.py run --file fof.txt --run-dir ./count_run --kmer-size 31 --nb-cores 8 --nb-partitions 4 --count-abundance-min 0 --recurrence-min 1 --mode ascii --lz4 And I the content in the 'fof.txt' is shown below.

image The k-mer size in the output file is still 20. Is there anything wrong with my command or the file 'fof.txt'?

tlemane commented 3 years ago

Hi,

Thank you for reporting this ! The k-mers were truncated when writing the matrix in text format (0a4b05fbef79a43dcb1fcc485b79498b457cd8b8). This is now fixed in the latest release 0.0.5 (available on conda).

liaoherui commented 3 years ago

Hi,

Thanks for your prompt reply! This problem is fixed now!

I will do more tests about kmtricks. Currently, I already tested it on 280+ bacterial strain genomes (highly similar) with 8 threads and 10 partitions. It takes about 8 minutes to construct the matrix. Then, I want to test it on 10^3 or even 10^4 bacterial strain genomes. I am not very sure whether it can be applied to datasets so large...