rpetit3 / dragonflye

:dragon: :fly: Assemble bacterial isolate genomes from Nanopore reads
GNU General Public License v3.0
117 stars 10 forks source link

Batch option for Medaka #11

Closed Spiltz closed 2 years ago

Spiltz commented 2 years ago

Hi!

Running into issues with medaka polishing step. Runs out of GPU memory. Medaka manual states that passing a batch option (-b) to medaka_consensus helps limit the GPU memory usage.

Tried by editing the bin file and it works. Could there be a way to dynamically pass a batch size option to medaka when calling dragonflye?

Thanks!

MV

rpetit3 commented 2 years ago

Yeah, I think I can get it added, unfortunately won't be able to test (no gpu to test on).

Are you thinking something like --batch_size INT at runtime?

Spiltz commented 2 years ago

Yes exactly. medaka call would look something like this: run_cmd("medaka_consensus -i $FINAL_READS -d $POLISHED_FASTA -o $polish_dir -m $model -b $batch_size -t $cpus 2>&1", "polishing - medaka ($i of $medaka)");

Default batch size is 200.

rpetit3 commented 2 years ago

Do you think a --medaka_opts might be more useful so that you can have access to change any other parameters in medaka?

Spiltz commented 2 years ago

Sure, adds quite a lot of flexibility there. Also prevents introducing code to check for --batch_size non-empty

rpetit3 commented 2 years ago

Awesome! I'll get this added for you. Will update soon

rpetit3 commented 2 years ago

Hopefully this does it for you: https://github.com/rpetit3/dragonflye/commit/79686e6c6103996d6a7313b4587956cdaeb67f46

if tests pass, I'll submit a new release.

rpetit3 commented 2 years ago

This should be available in v1.0.13 (https://github.com/rpetit3/dragonflye/releases/tag/v1.0.13). I'll get it updated on Bioconda soon.

Thank you very much for the suggestion @Spiltz !

Please feel free to reopen if it doesn't work as expected!