Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of viral pathogens of concern, especially SARS-CoV-2
First would be to remove the --classified-out option. I don't believe classified reads are used downstream, correct? If they are, ignore this suggestion.
Second would be to redirect the STDOUT to /dev/null. The node/VM running the task doesn't need to print kraken2's verbose STDOUT or save it to the log so it saves A LOT of runtime and a good bit of diskspace.
I recently saw a kraken2 log that was ~66MB. That's too big 🥴
Without >/dev/null:
$ time kraken2 --classified-out 3000112549.cseqs.fq --threads 4 --db /kraken2-db 3000112549_S11_L001_R1_001.fastq.gz \
--report 3000112549_kraken2_report.txt
<tons of STDOUT removed>
6486078 sequences (236.88 Mbp) processed in 139.402s (2791.7 Kseq/m, 101.96 Mbp/m).
6074768 sequences classified (93.66%)
411310 sequences unclassified (6.34%)
real 2m27.289s
user 0m49.805s
sys 0m41.093s
with >/dev/null with --classified-out, an order of magnitude faster
^Removing --classified-out actually seems to make things a little bit slower, but still no sense in writing the file if we don't intend to use it.
And thirdly kraken2 does benefit from extra cpus, I would throw 8 cpus (max) at the task, and scale RAM accordingly if Terra/Cromwell doesn't do it for you:
I think kraken2 tasks could be sped up greatly with some tweaks to the command: https://github.com/theiagen/public_health_viral_genomics/blob/5a3d1f7510f4c57b8602049469b6d0329ca5c430/tasks/task_taxonID.wdl#L21
First would be to remove the
--classified-out
option. I don't believe classified reads are used downstream, correct? If they are, ignore this suggestion.Second would be to redirect the STDOUT to /dev/null. The node/VM running the task doesn't need to print kraken2's verbose STDOUT or save it to the log so it saves A LOT of runtime and a good bit of diskspace.
I recently saw a kraken2 log that was ~66MB. That's too big 🥴
Without
>/dev/null
:with
>/dev/null
with--classified-out
, an order of magnitude fasterwith
>/dev/null
without--classified-out
:^Removing
--classified-out
actually seems to make things a little bit slower, but still no sense in writing the file if we don't intend to use it.And thirdly kraken2 does benefit from extra cpus, I would throw 8 cpus (max) at the task, and scale RAM accordingly if Terra/Cromwell doesn't do it for you: