jenniferlu717 / Bracken

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
http://ccb.jhu.edu/software/bracken/index.shtml
GNU General Public License v3.0
289 stars 50 forks source link

kmer2read_distr killed at STEP 3 (READING DATABASE.KRAKEN FILE) #187

Open savytskanatalia opened 2 years ago

savytskanatalia commented 2 years ago

Hello. I am trying to build a bracken DB for my custom Kraken2 DB similar to maxikraken2_1903_140GB ( https://lomanlab.github.io/mockcommunity/mc_databases.html ). As I have not found any distribution of Bracken DB for maxikraken2 DB I went on regenerating it to obtain necessary files for Bracken. In the previous step I end up with database.kraken of 729 Gb. My RAM is 377 Gb, so whenever I run kmer2read_distr --seqid2taxid the job gets killed, as I suspect due to running out of RAM. Is there any way to split database or omit it being fully loaded to RAM while building DB? Example:

/opt/bracken/src/kmer2read_distr --seqid2taxid Krakenstein2/seqid2taxid.map --taxonomy Krakenstein2/taxonomy --kraken Krakenstein2/database.kraken --output Krakenstein2/Krakenstein_150mers.kraken -k 35 -l 150 -t 32

>>STEP 0: PARSING COMMAND LINE ARGUMENTS
    Taxonomy nodes file: Krakenstein2/taxonomy/nodes.dmp
    Seqid file:          Krakenstein2/seqid2taxid.map
    Num Threads:         32
    Kmer Length:         35
    Read Length:         150
>>STEP 1: READING SEQID2TAXID MAP
    26618342 total sequences read
>>STEP 2: READING NODES.DMP FILE
    2416809 total nodes read
>>STEP 3: READING DATABASE.KRAKEN FILE
    13513290Killednces read...

And example of message from OOM:

dmesg -T | egrep -i 'killed process'

  [Mon May  9 19:29:08 2022] Out of memory: Killed process 2370765 (kmer2read_distr) total-vm:397024724kB, anon-rss:389071168kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:777036kB oom_score_adj:0
Smedard commented 2 years ago

Hello @savytskanatalia ! I ran into similar problem with my custom Kraken2 Db. Did you manage to find any work around this issue ?

savytskanatalia commented 2 years ago

@Smedard my workaround was transferring and restarting the process on a machine with 1 Tb RAM... Though it is still running Step 4 of DB generation.