Closed AlWa1 closed 12 months ago
Yes there is though it's not very intuitive, you have to adjust the size of either the index for each job or number of reads it streams against the index (described here: https://canu.readthedocs.io/en/latest/parameter-reference.html#overlapper-configuration). This is the obtOvlRefBlockLength
and utgOvlRefBlockLength
parameters which default to 5000000000
for your genome size. If you drop this in half you double the jobs so you could try going to 1000000000
and see how many jobs you end up with and how long they take. You can also set ovlThreads=113 so you don't have to manually adjust it. Lastly, you could also try the -fast option which will speed this step up but might give you a slightly less continuous assembly.
Idle
Amazing, that worked perfectly - many thanks!
Dear Canu team,
I am currently using canu to assembly a highly repetetive fungal genome (>2/3 repetitive elemnts, total genome size roughly 127 Mbp). Since our cluster (SLURM) does not support automatic resubmission on computing nodes, I am running the assembly in the useGrid=remote mode. Everything so far ran fine but now in the trimming step (overlapper) some of the individual batch jobs run longer than the currently provided wall-time of 12 hours (free access). 22 jobs finished in time while 42 jobs hit the wall limit. I am providing 113 cores (full node with Intel Sapphire Rapids) per job (manually adjusting the thread number in submission script and overlap.sh).
In order to avoid exceeding the 12 hours limit on our cluster, is there a way to decrease the size of the trimming/overlapper jobs to make each job finish in time ?
Many thanks in advance, Alan
Command:
canu -p "scaffold" -d PATH/20231103_canu_onestep_test genomeSize=127m -raw -nanopore PATH/merged_fastq_pass_S2.fastq gridOptions="-A SL3-CPU -p sapphire -t 12:00:00 --mail-type=ALL" useGrid=remote