Closed EarlyEvol closed 6 years ago
Yes, this is a known issue because PBSPro can't request memory and threads in separate arguments but only in one (seriously what is wrong with PBS?). See issue #466 for the original workaround. Your parameters will request the right number of CPUs. I expect they are taking a long time to schedule because they are all asking for 30hrs wall time and 100g.
You can probably drop the 100g to less (maybe 48g) and set maxMemory=48 to make Canu only use this much. Check the defaults Canu tried to set which it reports at the start of the run (right after detected resources). You can also lower the 30hrs wall clock time if that's much longer than the jobs are taking.
There is already a feature request to make THREADS/MEMORY replaced in a single variable (probably one of gridEngineMemoryOption or ThreadOption so you could specify gridEngineMemoryOption="-l select=1:ncpus=THREADS:mem=MEMORY"
, probably will be in the next release. Until then the workaround is the only way to run it.
Thanks for the reply. I'm sure you are tired of hearing about the woes of PBS users!
Since PBS scales up the requested CPUs based on the RAM request, asking for 100G will always request 100/6 cpus (6gb per cpu on on most of our nodes). Its not the worst, but was enough to leave jobs in the queue for a while, which added up.
I'm sorry for not updating this issue with the command I have been using with lots of success (super duper fast). It just asks for all the resources explicitly for each step (based on canu.out). Each job still asks for more walltime than they need, but they ran pretty quick, so I didn't mess with it. If someone is really experiencing slowdown, it shouldn't be hard to add "-l walltime=30:00:00" to each gridOptionJOB arg. They will probably have to put a dummy arg for gridEngineMemoryOption so Canu doesnt try to insert a Memory request itself.
In case anyone else is suffering the PBS blues, here it is:
SAMPLE=Your_species && canu -p $SAMPLE\_new5 \
-d ~/WorkingDir/assemblies/$SAMPLE\_canu_new5 genomeSize=700m \
-pacbio-raw /PATH/WorkingDir/data/PacBio/$SAMPLE/reads/$SAMPLE\_PacBio_new?.fastq \
gridEngineMemoryOption="-l walltime=30:00:00" \
gridEngineThreadsOption="-q windfall -W group_list=Lab_group" \
stageDirectory=/tmp/\$USER\_\$PBS_JOBID gridOptionsExecutive="-l select=1:mem=12gb:ncpus=2" \
gridOptionsMERYL="-l select=1:mem=64gb:ncpus=16" gridOptionsCORMHAP="-l select=1:mem=24gb:ncpus=4" \
gridOptionsOBTOVL="-l select=1:mem=30gb:ncpus=5" gridOptionsUTGOVL="-l select=1:mem=30gb:ncpus=5" \
gridOptionsOVB="-l select=1:mem=12gb:ncpus=2" gridOptionsOVS="-l select=1:mem=18gb:ncpus=3" \
gridOptionsRED="-l select=1:mem=24gb:ncpus=4" gridOptionsOEA="-l select=1:mem=12gb:ncpus=2" \
gridOptionsBAT="-l select=1:mem=168gb:ncpus=28" gridOptionsGFA="-l select=1:mem=108gb:ncpus=18" \
gridOptionsCOR="-l select=1:mem=24gb:ncpus=4" gridOptionsCNS="-l select=1:mem=60gb:ncpus=10"
With 75X and N50=20kb PacBio data, this resulted in some chromosome arm length contigs for the 500mb hymenoptera genome I'm working on.
Thanks for the great assembler! Earl
You can put all those parameter requests in your home directory in a file named ~/.canu
and they will be the default for any run you launch (same format as you use on the command line except no " needed). You could also add it to a file named canu.defaults in the same folder that the canu executables are in then everyone running that version of canu will get your changes.
Hi All, Sorry to be another user stuck with PBSpro.
I'm not sure if this a feature request or if I just need a little help working around the annoying "select" statement in PBSpro. Here is the command I have used so far which works, but gobbles up a ton of CPUhours and has jobs which sit in the queue for a long time because I tell them all to request extra cpus and 30hrs walltime. Originally, this workaround was for version 1.4, but I just updated to 1.7.1 with similar results
SAMPLE=Lh14 && canu -p $SAMPLE -d ~/WorkingDir/assemblies/$SAMPLE\_canu genomeSize=460m -pacbio-raw /home/u14/earlm1/WorkingDir/data/PacBio/$SAMPLE/reads/$SAMPLE\_PB_all.new1.2.fastq gridOptions="-W group_list=earlyevol -q standard" gridEngineMemoryOption="-l walltime=30:00:00" stageDirectory=/tmp/\$USER\_\$PBS_JOBID gridEngineThreadsOption="-l select=1:ncpus=THREADS:mem=100gb"
To get it canu to submit jobs requesting the right resources, I tried:
SAMPLE=Lh14 && canu -p $SAMPLE -d ~/WorkingDir/assemblies/$SAMPLE\_canu genomeSize=460m -pacbio-raw /home/u14/earlm1/WorkingDir/data/PacBio/$SAMPLE/reads/$SAMPLE\_PB_all.new1.2.fastq gridOptions="-W group_list=earlyevol -q standard -l select=1:ncpus=THREADS:mem=MEMORY" stageDirectory=/tmp/\$USER\_\$PBS_JOBID
Canu doesn't expand THREADS and MEMORY outside of gridEngineThreadsOption and gridEngineMemoryOption.:
qsub: Illegal attribute or resource value select.ncpus
I tried a couple of other things, like tricking canu into generating the select argument with the threads and memory grid engine options:
SAMPLE=Lh14 && canu -p $SAMPLE -d ~/WorkingDir/assemblies/$SAMPLE\_canu genomeSize=460m -pacbio-raw /home/u14/earlm1/WorkingDir/data/PacBio/$SAMPLE/reads/$SAMPLE\_PB_all.new1.2.fastq gridOptions="-W group_list=earlyevol -q standard -l walltime=10:00:00" stageDirectory=/tmp/\$USER\_\$PBS_JOBID gridEngineMemoryOption="-l select=1:mem=MEMORY:" gridEngineThreadsOption="npcus=THREADS"
Close, but PBS complains about the space in the select arg. qsub \ -j oe \ -l select=1:mem=4g: ncpus=1 \ -W group_list=earlyevol \ -q standard \ -l walltime=10:00:00 \ -N 'canu_Lh14' \ -o canu-scripts/canu.06.out canu-scripts/canu.06.sh
Because PBSpro will automatically scale up the cpus to match the mem requested (6gb/cpu) I tried to use
gridEngineMemoryOption="-l select=1:ncpus=1:mem=MEMORY"
Sadly canu prints the arguments in an order that PBS complains about (WTF!?!).If I'm missing something or anyone using PBSpro has found a workaround to request the right resources please share!
If adding the ability for canu to expand the THREADS and MEMORY variables in the gripOptions argument is reletively simple, it would allow uses (in stead of you guys) to deal with the ever evolving PBS syntax.
Thanks Earl