tlemane / kmtricks

modular k-mer count matrix and Bloom filter construction for large read collections
GNU Affero General Public License v3.0
72 stars 7 forks source link

kmtricks uses all RAM #27

Open szachn-u opened 1 year ago

szachn-u commented 1 year ago

hello

I launched this kmtricks command for a file of files with ~ 37000 fastq files

kmtricks pipeline --file "$list_fq" --run-dir "$outDir/out" --kmer-size 31 --hard-min 1 --mode kmer:count:bin --until count --cpr -t 20

i runs nicely but at some point uses all ram available (i have 128Gb) and the script stops

./analysis_10x_MDAMB468_kmtricks.sh : ligne 30 : 21185 Processus arrêté "$kmtricks" pipeline --file "$list_fq" --run-dir "$outDir/out" --kmer-size 31 --hard-min 1 --mode kmer:count:bin --until count --cpr -t 20

Is there a way to limit the RAM usage ?

thanks a lot

tlemane commented 10 months ago

Hello, Sorry for my late reply.

Unfortunately, you are encountering a known kmtricks problem. Memory grows with the number of files, regardless of their sizes, which is not expected. I have started investigating and this will be corrected in future releases.

It seems you use kmtricks only to count k-mers without building the matrix (because of --until count), right? In this case, you can simply do it in several runs by splitting your fof into smaller ones. If you need a matrix, you can use the same strategy and then combine all runs using kmtricks combine. See https://github.com/tlemane/kmtricks/wiki/combine.

Hope this helps Teo

MorillonLab commented 10 months ago

Hi Teo thanks for the answer, i'll try it. Ugo