ruanjue / wtdbg2

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly
GNU General Public License v3.0
513 stars 94 forks source link

resume assembly #193

Closed jdmontenegro closed 4 years ago

jdmontenegro commented 4 years ago

Dear Sir/Madam,

Thank you for such a great tool. I have recenty started using it on som plant genomes and the results look very promising. I have run into a problem though. I have a large plant genome (~4Gb) and the only large node in my cluster with enough memory to run the assembly has a walltime of 96 hours. I have around 20h left on my clock and the alignments file is only around 8G long, I am asking for walltime extension, but if I do not get it, is it possible to resume the assembly where it got interrupted? Or would I need to restart the assembly?

Thank you.

Juan D. Montenegro

ruanjue commented 4 years ago

wtdbg2 can resume assembly from two breakpoints: i) load alignments; ii) load nodes. So, if the process finish the alignment step, you can resume the asssembly. Another trick, you can use ·--dump-kbm· to add a new breakpoint just after kmer indexing, which can be resumed by --load-kbm.

jdmontenegro commented 4 years ago

Thank you Ruanjue. I am testing it now. Hopefully with the additional breakpoint I can get it assembled despite the walltime limits. Cheers,

marcodelapierre commented 4 years ago

Hi @ruanjue ,

do the options --dump-kbm and --load-kbm work along with --kbm-parts? I mean, can one increase the number of kbm-parts to get multiple kbm dumping, and then use load-kbm on those multiple dumps?

I am asking this having in mind both reduced memory footprint and more granular control over checkpoint/restart.

Thanks in advance, Marco

ruanjue commented 4 years ago

--dump-kbm only dump the last part of kbm-index. If --kbm-parts 2, the first part will be ignored.

marcodelapierre commented 4 years ago

thanks for the clarification Jue