Closed dcopetti closed 8 years ago
Don't change the values in Defaults.pm, those should be passed in on the command line. If you really want to change the code, make the change in Grid_SGE.pm.
We used to allow (in Celera Assembler) a dotfile to hold common settings. It's not enabled in canu at the moment, for reasons I can't remember. It read either a dotfile in yout home directory, or one in the binary directory. I'll fix that tomorrow.
This is most likely because Canu can't find the appropriate ways to request memory on your SGE configuration. You can control these by adding the options:
gridEngineMemoryOption="-l h_vmem=MEMORY" gridEngineThreadsOption="-pe make THREADS"
to your command line. As Brian said, you shouldn't need to modify the code.
That said, I don't think either h_vmem or make are options you should use. At least on systems I've seen h_vmem is not a consumable resource which means two jobs requesting 60G at the same time could get scheduled on a single 60G machine and then try to use a total of 120G, bringing the machine down. You want a memory parameter that locks the memory for a process, if you don't have an option like that you would need to add one. You can check the available memory options using:
% qconf -sc|grep MEMORY
#name shortcut type relop requestable consumable default urgency
#-----------------------------------------------------------------------------------------------------
h_vmem h_vmem MEMORY <= YES NO 0 0
mem_free mf MEMORY <= YES YES 0 0
The make parallel environment is also usually not configured for multi-threaded jobs. On our system:
% qconf -sp make
pe_name make
slots 999
user_lists NONE
xuser_lists NONE
start_proc_args NONE
stop_proc_args NONE
allocation_rule $round_robin
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary TRUE
qsort_args NONE
Round robin will assign slots from different machines to the job: $round_robin - select one slot from each host in a round-robin fashion until all job slots are assigned. This setting can result in more than one job slot per host.
You would want pe_slots which ensures all jobs are on the same machine. $pe_slots - place all the job slots on a single machine. Grid Engine will only schedule such a job to a machine that can host the maximum number of slots requested by the job.
Hi, I tried to add your parameters, and it gives a longer error:
With -pe make THREADS [smrtanalysis@pac canu_test]$ /opt/canu/Linux-amd64/bin/canu -d /../250k_assembly -p N22_test250k genomeSize=380m corMinCoverage=2 gridEngineMemoryOption="-l h_vmem=MEMORY" gridEngineThreadsOption="-pe make THREADS" errorRate=0.18 -pacbio-raw N22_42cells_250k_subreads.fa -- Detected Java(TM) Runtime Environment '1.8.0_66' (from 'java'). -- Detected 12 CPUs and 31 gigabytes of memory. -- Detected Sun Grid Engine in '/usr/share/gridengine/default'. -- User supplied Grid Engine environment '-pe make THREADS'.
-- Found 4 hosts with 12 cores and 31 GB memory under Sun Grid
-- Allowed to run under grid control, and use up to 6 compute threads
and 15 GB memory for stage 'bogart (unitigger)'.
-- Allowed to run under grid control, and use up to 12 compute threads
and 13 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run under grid control, and use up to 12 compute threads
and 13 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run under grid control, and use up to 12 compute threads
and 13 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run under grid control, and use up to 6 compute threads
and 12 GB memory for stage 'read error detection (overlap error
adjustment)'.
-- Allowed to run under grid control, and use up to 1 compute thread
and 2 GB memory for stage 'overlap error adjustment'.
-- Allowed to run under grid control, and use up to 8 compute threads
and 31 GB memory for stage 'utgcns (consensus'.
-- Allowed to run under grid control, and use up to 1 compute thread
and 8 GB memory for stage 'overlap store sequential building'.
-- Allowed to run under grid control, and use up to 1 compute thread
and 2 GB memory for stage 'overlap store parallel bucketizer'.
-- Allowed to run under grid control, and use up to 1 compute thread
and 10 GB memory for stage 'overlap store parallel sorting'.
-- Allowed to run under grid control, and use up to 1 compute thread
and 2 GB memory for stage 'overlapper'.
-- Allowed to run under grid control, and use up to 6 compute threads
and 8 GB memory for stage 'overlapper'.
-- Allowed to run under grid control, and use up to 6 compute threads
and 8 GB memory for stage 'overlapper'.
-- Allowed to run under grid control, and use up to 12 compute threads
and 31 GB memory for stage 'meryl (k-mer counting)'.
-- Allowed to run under grid control, and use up to 6 compute threads
-- Starting command on Wed Feb 3 09:11:12 2016 with 339.3 GB free disk space qsub \ -l h_vmem=12g \ -pe make 1 \ -cwd \ -N "canu_N22_test250k" \ -j y \ -o /home/smrtanalysis/dario_test/canu_test/250k_assembly/canu-scripts/canu.01.out /home/smrtanalysis/dario_test/canu_test/250k_assembly/canu-scripts/canu.01.sh Unable to run job: job rejected: the requested parallel environment "make" does not exist. Exiting.
-- Finished on Wed Feb 3 09:11:12 2016 (lickety-split) with 339.3 GB
ERROR: Failed with signal HUP (1)
Please panic. canu failed, and it shouldn't have. Stack trace: at /opt/canu/Linux-amd64/bin/lib/canu/Defaults.pm line 230. canu::Defaults::caFailure("Failed to submit script", undef) called at /opt/canu/Linux-amd64/bin/lib/canu/Execution.pm line 851 canu::Execution::submitScript("/home/smrtanalysis/dario_test/canu_test/250k_assembly", "N22_test250k", undef) called at /opt/canu/Linux-amd64/bin/canu line 312
canu failed with 'Failed to submit script'.
also with a different option for the -pe: ###########################
with -pe thread THREADS: [smrtanalysis@pac canu_test]$ /opt/canu/Linux-amd64/bin/canu -d /../250k_assembly -p N22_test250k genomeSize=380m corMinCoverage=2 gridEngineMemoryOption="-l h_vmem=MEMORY" gridEngineThreadsOption="-pe thread THREADS" errorRate=0.18 -pacbio-raw N22_42cells_250k_subreads.fa -- Detected Java(TM) Runtime Environment '1.8.0_66' (from 'java'). -- Detected 12 CPUs and 31 gigabytes of memory. -- Detected Sun Grid Engine in '/usr/share/gridengine/default'. -- User supplied Grid Engine environment '-pe thread THREADS'.
-- Found 4 hosts with 12 cores and 31 GB memory under Sun Grid
-- Allowed to run under grid control, and use up to 6 compute threads
and 15 GB memory for stage 'bogart (unitigger)'.
-- Allowed to run under grid control, and use up to 12 compute threads
and 13 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run under grid control, and use up to 12 compute threads
and 13 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run under grid control, and use up to 12 compute threads
and 13 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run under grid control, and use up to 6 compute threads
and 12 GB memory for stage 'read error detection (overlap error
adjustment)'.
-- Allowed to run under grid control, and use up to 1 compute thread
and 2 GB memory for stage 'overlap error adjustment'.
-- Allowed to run under grid control, and use up to 8 compute threads
and 31 GB memory for stage 'utgcns (consensus'.
-- Allowed to run under grid control, and use up to 1 compute thread
and 8 GB memory for stage 'overlap store sequential building'.
-- Allowed to run under grid control, and use up to 1 compute thread
and 2 GB memory for stage 'overlap store parallel bucketizer'.
-- Allowed to run under grid control, and use up to 1 compute thread
and 10 GB memory for stage 'overlap store parallel sorting'.
-- Allowed to run under grid control, and use up to 1 compute thread
and 2 GB memory for stage 'overlapper'.
-- Allowed to run under grid control, and use up to 6 compute threads
and 8 GB memory for stage 'overlapper'.
-- Allowed to run under grid control, and use up to 6 compute threads
and 8 GB memory for stage 'overlapper'.
-- Allowed to run under grid control, and use up to 12 compute threads
and 31 GB memory for stage 'meryl (k-mer counting)'.
-- Allowed to run under grid control, and use up to 6 compute threads
-- Starting command on Wed Feb 3 09:14:40 2016 with 339.3 GB free disk space qsub \ -l h_vmem=12g \ -pe thread 1 \ -cwd \ -N "canu_N22_test250k" \ -j y \ -o /home/smrtanalysis/dario_test/canu_test/250k_assembly/canu-scripts/canu.01.out /home/smrtanalysis/dario_test/canu_test/250k_assembly/canu-scripts/canu.01.sh Unable to run job: job rejected: the requested parallel environment "thread" does not exist. Exiting. -- Finished on Wed Feb 3 09:14:40 2016 (lickety-split) with 339.3 GB
ERROR: Failed with signal HUP (1)
Please panic. canu failed, and it shouldn't have.
Stack trace: at /opt/canu/Linux-amd64/bin/lib/canu/Defaults.pm line 230. canu::Defaults::caFailure("Failed to submit script", undef) called at /opt/canu/Linux-amd64/bin/lib/canu/Execution.pm line 851 canu::Execution::submitScript("/home/smrtanalysis/dario_test/canu_test/250k_assembly", "N22_test250k", undef) called at /opt/canu/Linux-amd64/bin/canu line 312
canu failed with 'Failed to submit script'.
Our system does not have mpi or smp. Thanks,
Dario
On 02/03/2016 08:10 AM, Sergey Koren wrote:
This is most likely because Canu can't find the appropriate ways to request memory on your SGE configuration. You can control these by adding the options:
|gridEngineMemoryOption="-l h_vmem=MEMORY" gridEngineThreadsOption="-pe make THREADS" |
to your command line. As Brian said, you shouldn't need to modify the code.
That said, I don't think either h_vmem or make are options you should use. At least on systems I've seen h_vmem is not a consumable resource which means two jobs requesting 60G at the same time could get scheduled on a single 60G machine and then try to use a total of 120G, bringing the machine down. You want a memory parameter that locks the memory for a process, if you don't have an option like that you would need to add one. You can check the available memory options using:
|% qconf -sc|grep MEMORY #name shortcut type relop requestable consumable default urgency
-----------------------------------------------------------------------------------------------------
h_vmem h_vmem MEMORY <= YES NO 0 0 mem_free mf MEMORY <= YES YES 0 0 |
The make parallel environment is also usually not configured for multi-threaded jobs. On our system:
|% qconf -sp make pe_name make slots 999 user_lists NONE xuser_lists NONE start_proc_args NONE stop_proc_args NONE allocation_rule $round_robin control_slaves TRUE job_is_first_task FALSE urgency_slots min accounting_summary TRUE qsort_args NONE |
Round robin will assign slots from different machines to the job: $round_robin - select one slot from each host in a round-robin fashion until all job slots are assigned. This setting can result in more than one job slot per host.
You would want pe_slots which ensures all jobs are on the same machine. $pe_slots - place all the job slots on a single machine. Grid Engine will only schedule such a job to a machine that can host the maximum number of slots requested by the job.
— Reply to this email directly or view it on GitHub https://github.com/marbl/canu/issues/40#issuecomment-179285730.
Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5
1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu
The error message is listed in the canu output. Your qsub command did not accept the pe make option, it says that parallel environment does not exist:
Unable to run job: job rejected: the requested parallel environment
"make" does not exist.
Your system must not have make defined. What parallel environments do you have, you can check with
% qconf -spl
make
make-dedicated
thread
You can check check each to see if it does pe_slots scheduling with qconf -sp
Also, the errorRate is an optional parameter and is the error in the corrected reads not raw input data so 0.18 is too high. I'd leave it as default or maybe 0.035 as suggested for low coverage in the documentation.
I run your commands, and we actually have smp: [smrtanalysis@pac canu_test]$ qconf -spl smp [smrtanalysis@pac canu_test]$ qconf -sp smp pe_name smp slots 200 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /bin/true allocation_rule $pe_slots control_slaves FALSE job_is_first_task TRUE urgency_slots min accounting_summary FALSE qconf -sc gives along list: [smrtanalysis@pac canu_test]$ qconf -sc|grep MEMORY h_core h_core MEMORY <= YES NO 0 0 h_data h_data MEMORY <= YES NO 0 0 h_fsize h_fsize MEMORY <= YES NO 0 0 h_rss h_rss MEMORY <= YES NO 0 0 h_stack h_stack MEMORY <= YES NO 0 0 h_vmem h_vmem MEMORY <= YES NO 0 0 mem_free mf MEMORY <= YES NO 0 0 mem_total mt MEMORY <= YES NO 0 0 mem_used mu MEMORY >= YES NO 0 0 s_core s_core MEMORY <= YES NO 0 0 s_data s_data MEMORY <= YES NO 0 0 s_fsize s_fsize MEMORY <= YES NO 0 0 s_rss s_rss MEMORY <= YES NO 0 0 s_stack s_stack MEMORY <= YES NO 0 0 s_vmem s_vmem MEMORY <= YES NO 0 0 swap_free sf MEMORY <= YES NO 0 0 swap_rate sr MEMORY >= YES NO 0 0 swap_rsvd srsv MEMORY >= YES NO 0 0 swap_total st MEMORY <= YES NO 0 0 swap_used su MEMORY >= YES NO 0 0 virtual_free vf MEMORY <= YES NO 0 0 virtual_total vt MEMORY <= YES NO 0 0 virtual_used vu MEMORY >= YES NO 0 0
so if I run the command with smp it says it has been submitted, there is no activity on the cluster and it actually finishes right away (the -d folder contains canu-logs and canuscripts subfolders): [smrtanalysis@pac canu_test]$ /opt/canu/Linux-amd64/bin/canu -d /home/smrtanalysis/dario_test/canu_test/250k_assembly -p N22_test250k genomeSize=380m corMinCoverage=2 gridEngineMemoryOption="-l h_vmem=MEMORY" gridEngineThreadsOption="-pe smp THREADS" errorRate=0.18 -pacbio-raw N22_42cells_250k_subreads.fa -- Detected Java(TM) Runtime Environment '1.8.0_66' (from 'java'). -- Detected 12 CPUs and 31 gigabytes of memory. -- Detected Sun Grid Engine in '/usr/share/gridengine/default'. -- User supplied Grid Engine environment '-pe smp THREADS'.
-- Found 4 hosts with 12 cores and 31 GB memory under Sun Grid
-- Allowed to run under grid control, and use up to 6 compute threads
and 15 GB memory for stage 'bogart (unitigger)'.
-- Allowed to run under grid control, and use up to 12 compute threads
and 13 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run under grid control, and use up to 12 compute threads
and 13 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run under grid control, and use up to 12 compute threads
and 13 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run under grid control, and use up to 6 compute threads
and 12 GB memory for stage 'read error detection (overlap error
adjustment)'.
-- Allowed to run under grid control, and use up to 1 compute thread
and 2 GB memory for stage 'overlap error adjustment'.
-- Allowed to run under grid control, and use up to 8 compute threads
and 31 GB memory for stage 'utgcns (consensus'.
-- Allowed to run under grid control, and use up to 1 compute thread
and 8 GB memory for stage 'overlap store sequential building'.
-- Allowed to run under grid control, and use up to 1 compute thread
and 2 GB memory for stage 'overlap store parallel bucketizer'.
-- Allowed to run under grid control, and use up to 1 compute thread
and 10 GB memory for stage 'overlap store parallel sorting'.
-- Allowed to run under grid control, and use up to 1 compute thread
and 2 GB memory for stage 'overlapper'.
-- Allowed to run under grid control, and use up to 6 compute threads
and 8 GB memory for stage 'overlapper'.
-- Allowed to run under grid control, and use up to 6 compute threads
and 8 GB memory for stage 'overlapper'.
-- Allowed to run under grid control, and use up to 12 compute threads
and 31 GB memory for stage 'meryl (k-mer counting)'.
-- Allowed to run under grid control, and use up to 6 compute threads
-- Starting command on Wed Feb 3 10:00:04 2016 with 339.3 GB free disk space qsub \ -l h_vmem=12g \ -pe smp 1 \ -cwd \ -N "canu_N22_test250k" \ -j y \ -o /home/smrtanalysis/dario_test/canu_test/250k_assembly/canu-scripts/canu.01.out /home/smrtanalysis/dario_test/canu_test/250k_assembly/canu-scripts/canu.01.sh Your job 65592 ("canu_N22_test250k") has been submitted -- Finished on Wed Feb 3 10:00:04 2016 (lickety-split) with 339.3 GB free disk space
If it can help, canu-scripts/canu.01.out says: Warning: no access to tty (Bad file descriptor). Thus no job control in this shell. if: Expression Syntax.
Here they explain how to invoke smp: http://bioinformatics.mdc-berlin.de/intro2UnixandSGE/sun_grid_engine_for_beginners/parallel_environments.html and if I run it with 4, it gives this error: [smrtanalysis@pac canu_test]$ /opt/canu/Linux-amd64/bin/canu -d /home/smrtanalysis/dario_test/canu_test/250k_assembly -p N22_test250k genomeSize=380m corMinCoverage=2 gridEngineMemoryOption="-l h_vmem=MEMORY" gridEngineThreadsOption="-pe smp 4" errorRate=0.18 -pacbio-raw N22_42cells_250k_subreads.fa -- Detected Java(TM) Runtime Environment '1.8.0_66' (from 'java'). -- Detected 12 CPUs and 31 gigabytes of memory.
Please panic. canu failed, and it shouldn't have. Stack trace: at /opt/canu/Linux-amd64/bin/lib/canu/Defaults.pm line 230. canu::Defaults::caFailure("Couldn't parse gridEngineThreadsOption='-pe smp 4'", undef) called at /opt/canu/Linux-amd64/bin/lib/canu/Grid_SGE.pm line 126 canu::Grid_SGE::configureSGE() called at /opt/canu/Linux-amd64/bin/canu line 269 canu failed with 'Couldn't parse gridEngineThreadsOption='-pe smp 4''. maybe it is not the right smp.
Any suggestion is welcome. Thanks,
Dario
On 02/03/2016 09:37 AM, Sergey Koren wrote:
The error message is listed in the canu output. Your qsub command did not accept the pe make option, it says that parallel environment does not exist:
|Unable to run job: job rejected: the requested parallel environment "make" does not exist. |
Your system must not have make defined. What parallel environments do you have, you can check with
|% qconf -spl make make-dedicated thread |
You can check check each to see if it does pe_slots scheduling with qconf -sp . If you don't have any that has pe_slots scheduling then you would need to add one otherwise there is no way for a multi-threaded program to run on you cluster on a single node.
— Reply to this email directly or view it on GitHub https://github.com/marbl/canu/issues/40#issuecomment-179330761.
Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5
1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu
Your first option (with THREADS, not 4) is correct. The error you're getting:
if: Expression Syntax.
is issue #21. Until it's fixed you need to explicitly tell your SGE scheduler to run the jobs under bash. You can do this by adding:
gridOptions="-S /bin/bash"
or whatever the path to your bash is.
we did some progress. First, logged in to a specific node (qlogin -l h=n002), then after making sure that we have the right java version, we launched the command [smrtanalysis@n002 ~]$ /opt/canu/Linux-amd64/bin/canu -d /home/smrtanalysis/dario_test/canu_test/250k_assembly -p N22_test250k genomeSize=380m corMinCoverage=2 gridEngineMemoryOption="-l h_vmem=MEMORY" gridEngineThreadsOption="-pe smp THREADS" gridOptions="-S /bin/bash" errorRate=0.18 -pacbio-raw /home/smrtanalysis/dario_test/canu_test/N22_42cells_250k_subreads.fa that printed these lines: -- Starting command on Wed Feb 3 13:21:28 2016 with 339.3 GB free disk space qsub \ -l h_vmem=12g \ -pe smp 1 \ -S /bin/bash \ -cwd \ -N "canu_N22_test250k" \ -j y \ -o /home/smrtanalysis/dario_test/canu_test/250k_assembly/canu-scripts/canu.02.out /home/smrtanalysis/dario_test/canu_test/250k_assembly/canu-scripts/canu.02.sh Your job 65597 ("canu_N22_test250k") has been submitted -- Finished on Wed Feb 3 13:21:28 2016 (lickety-split) with 339.3 GB free disk space The output folder contains canu-logs, canu-scripts and correction folders and correction.html file. The 0-mercounts folder has .mcdat and .mcidx files, and canu-logs has gatekeeper logs.
Looks like we are moving ahead a bit :-)
Dario
On 02/03/2016 10:46 AM, Sergey Koren wrote:
Your first option (with THREADS, not 4) is correct. The error you're getting:
|if: Expression Syntax. |
is issue #21 https://github.com/marbl/canu/issues/21. Until it's fixed you need to explicitly tell your SGE scheduler to run the jobs under bash. You can do this by adding:
|gridOptions="-S /bin/bash" |
or whatever the path to your bash is.
— Reply to this email directly or view it on GitHub https://github.com/marbl/canu/issues/40#issuecomment-179370919.
Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5
1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu
The java version is different on the head node environment versus the qsubed job? You can add -V to your gridOptions line which should force the environment to be preserved in your submitted command.
With gridOptions="-V -S /bin/bash" in the command, it still gives me the error: -o /home/smrtanalysis/dario_test/canu_test/250k_assembly/canu-scripts/canu.02.out /home/smrtanalysis/dario_test/canu_test/250k_assembly/canu-scripts/canu.02.sh Your job 65607 ("canu_N22_test250k") has been submitted
-- Finished on Wed Feb 3 17:59:31 2016 (lickety-split) with 339.3 GB free disk space
Dario
On 02/03/2016 11:54 AM, Sergey Koren wrote:
The java version is different on the head node environment versus the qsubed job? You can add -V to your gridOptions line which should force the environment to be preserved in your submitted command.
— Reply to this email directly or view it on GitHub https://github.com/marbl/canu/issues/40#issuecomment-179402185.
Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5
1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu
I didn't see an error in your last comment. It submitted the job to the grid which is OK, the output will now be in canu.02.out and the job will progress in the background on the grid.
You are right: something must be moving, because two nodes have some activity now: [smrtanalysis@pac canu_test]$ qhost HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO
global - - - - - - - n001 lx26-amd64 12 1.00 31.3G 749.0M 7.5G 148.5M n002 lx26-amd64 12 2.85 31.3G 1.2G 7.5G 1.1G n003 lx26-amd64 12 1.00 31.3G 579.8M 7.5G 196.1M pac lx26-amd64 12 3.16 31.3G 5.0G 2.0G 2.0G [smrtanalysis@pac canu_test]$ qstat job-ID prior name user state submit/start at
65596 0.55500 QLOGIN smrtanalysis r 02/03/2016 10:58:38 all.q@n002.genome.arizona.edu 1 65605 0.55500 canu_N22_t smrtanalysis r 02/03/2016 15:50:08 all.q@n002.genome.arizona.edu 1 65606 0.55500 canu_N22_t smrtanalysis r 02/03/2016 15:50:08 all.q@n002.genome.arizona.edu 1 65602 0.60500 cormhap_N2 smrtanalysis Eqw 02/03/2016 13:57:40 12 1-9:1 65600 0.50500 canu_N22_t smrtanalysis Eqw 02/03/2016 13:57:11 1 65604 0.50500 canu_N22_t smrtanalysis Eqw 02/03/2016 13:57:55 1 65603 0.00000 canu_N22_t smrtanalysis hqw 02/03/2016 13:57:40 1
For the processes in Eqw status, I see:
job_number: 65604
exec_file: job_scripts/65604
submission_time: Wed Feb 3 13:57:55 2016
owner: smrtanalysis
uid: 601
group: smrtanalysis
gid: 601
sge_o_home: /home/smrtanalysis
sge_o_log_name: smrtanalysis
sge_o_path:
/usr/share/gridengine/bin/lx26-amd64:/usr/lib64/qt-3.3/bin:/usr/NX/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/pssc/bin:/opt/openmpi/bin:/opt/torque/bin:/opt/torque/sbin:/home/smrtanalysis/bin:/opt/smrtanalysis/install/smrtanalysis-2.3.0.140936/analysis/bin/:/opt/tools/:/opt/tools/amos-3.1.0
sge_o_shell: /bin/bash
sge_o_workdir: /home/smrtanalysis/dario_test/canu_test/250k_assembly
sge_o_host: pac
account: sge
cwd: /home/smrtanalysis/dario_test/canu_test/250k_assembly
merge: y
hard resource_list: h_vmem=12g
mail_list: smrtanalysis@pac.genome.arizona.edu
notify: FALSE
job_name: canu_N22_test250k
stdout_path_list:
NONE:NONE:/home/smrtanalysis/dario_test/canu_test/250k_assembly/canu-scripts/canu.01.out
jobshare: 0
shell_list: NONE:/bin/bash
env_list:
HOSTNAME=pac.genome.arizona.edu,SHELL=/bin/bash,TERM=xterm,HISTSIZE=1000,SSH_CLIENT=128.196.149.30
55610
22,SGE_CELL=default,OLDPWD=/home/smrtanalysis/dario_test/canu_test,QTDIR=/usr/lib64/qt-3.3,QTINC=/usr/lib64/qt-3.3/include,SSH_TTY=/dev/pts/3,USER=smrtanalysis,LS_COLORS=
[...]
,CANU_DIRECTORY=/home/smrtanalysis/dario_test/canu_test/250k_assembly,PATH=/usr/share/gridengine/bin/lx26-amd64:/usr/lib64/qt-3.3/bin:/usr/NX/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/pssc/bin:/opt/openmpi/bin:/opt/torque/bin:/opt/torque/sbin:/home/smrtanalysis/bin:/opt/smrtanalysis/install/smrtanalysis-2.3.0.140936/analysis/bin/:/opt/tools/:/opt/tools/amos-3.1.0,MAIL=/var/spool/mail/smrtanalysis,NXDIR=/usr/NX,PWD=/home/smrtanalysis/dario_test/canu_test/250k_assembly,SGE_EXECD_PORT=6445,LANG=en_US.UTF-8,MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles,SGE_QMASTER_PORT=6444,LOADEDMODULES=NONE,SGE_ROOT=/usr/share/gridengine,HISTCONTROL=ignoredups,SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass,HOME=/home/smrtanalysis,SHLVL=2,LOGNAME=smrtanalysis,CVS_RSH=ssh,QTLIB=/usr/lib64/qt-3.3/lib,SSH_CONNECTION=128.196.149.30
55610 150.135.237.5
22,MODULESHOME=/usr/share/Modules,LESSOPEN=|/usr/bin/lesspipe.sh
%s,SGE_CLUSTER_NAME=p6444,G_BROKEN_FILENAMES=1,BASH_FUNCmodule()=() {
eval /usr/bin/modulecmd bash $*
,=/usr/share/gridengine/bin/lx26-amd64/qsub
script_file:
/home/smrtanalysis/dario_test/canu_test/250k_assembly/canu-scripts/canu.01.sh
parallel environment: smp range: 1
error reason 1: 02/03/2016 15:59:22 [601:6438]: error: can't
open output file "/home/smrtanalysis/dario_test/canu_te
scheduling info: (Collecting of scheduler job information is
turned off)
Is the last line telling something? Or are those 3 maybe just waiting
for the three in r mode above?
Now, how does Canu know how many nodes/cores to use? does it use all the cluster's resources, or can I tell him to use some and leave some cores for other computation? Thanks,
Dario
On 02/03/2016 04:00 PM, Sergey Koren wrote:
I didn't see an error in your last comment. It submitted the job to the grid which is OK, the output will now be in canu.02.out and the job will progress in the background on the grid.
— Reply to this email directly or view it on GitHub https://github.com/marbl/canu/issues/40#issuecomment-179519476.
Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5
1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu
The E state jobs might just be your previous failed runs.I would erase them from the queue.
Canu parses the qhosts output to detects the machines in your cluster and picks job sizes that would enable it to run across the most machines given the resources available and the genome size you're assembling. Several steps are submitted as large array jobs in which case it could potentially consume a large part of your cluster, depending on how your scheduler works. The SGE scheduler might restrict the number of cores that a user/parallel environment can request. Generally, I prefer letting the cluster scheduler manage your jobs rather than trying to manage the job scheduling yourself. However, you can customize each step if you want, for example by using the -tc parameter to restricts the number of array jobs that can run in parallel at a time. For example, -tc 10 on a 100 job array would ensure only 10 jobs can be scheduled at a time, limiting the cores used by your job. You can see a list of grid options using:
canu -options |grep gridOptions
gridOptions Grid engine options applied to all jobs
gridOptionsExecutive Grid engine options applied to the canu executive script
gridOptionsJobName Grid jobs job-name suffix
gridOptionsbat Grid engine options applied to unitig construction jobs
gridOptionscns Grid engine options applied to unitig consensus jobs
gridOptionscor Grid engine options applied to read correction jobs
gridOptionscormhap Grid engine options applied to mhap overlaps for correction jobs
gridOptionscorovl Grid engine options applied to overlaps for correction jobs
gridOptionsmeryl Grid engine options applied to mer counting jobs
gridOptionsobtmhap Grid engine options applied to mhap overlaps for trimming jobs
gridOptionsobtovl Grid engine options applied to overlaps for trimming jobs
gridOptionsoea Grid engine options applied to overlap error adjustment jobs
gridOptionsovb Grid engine options applied to overlap store bucketizing jobs
gridOptionsovs Grid engine options applied to overlap store sorting jobs
gridOptionsred Grid engine options applied to read error detection jobs
gridOptionsutgmhap Grid engine options applied to mhap overlaps for unitig construction jobs
gridOptionsutgovl Grid engine options applied to overlaps for unitig construction jobs
You can add the tc parameter to all the options except gridOptionsExecutive and gridOptions as those should all be array jobs. Those options should also all be additive in that your tc option will be included along with whatever canu picks for the parameters.
Sergey,
The process seem to have run and it stopped for a java problem. The canu.02.out file says: ERROR: mhap overlapper requires java version at least 1.8.0; you have 1.7.0_51, but if I check the version on the node where I run it from I see: [smrtanalysis@n002 canu_test]$ java -version java version "1.8.0_66" Java(TM) SE Runtime Environment (build 1.8.0_66-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) Maybe we did not propagate the new java well enough yet, to all the nodes maybe? Ann, I doubt it.
If it can help, the canu.01.out says:
bash: BASH_FUNC_module(): line 0: syntax error near unexpected token )' bash: BASH_FUNC_module(): line 0:
BASH_FUNC_module() () { eval
/usr/bin/modulecmd bash $*
'
bash: error importing function definition for BASH_FUNC_module' bash: module: line 1: syntax error: unexpected end of file bash: error importing function definition for
module'
-- Detected Java(TM) Runtime Environment '1.8.0_66' (from 'java').
and at the bottom:
-- Finished on Wed Feb 3 22:17:38 2016 (15974 seconds) with 338.4 GB
runCommandSilently() gnuplot < /home/smrtanalysis/dario_test/canu_test/250k_assembly/correction/N22_test250k.gkpStore/readlengths.gp \
/dev/null 2>&1 ERROR: Failed with signal HUP (1) runCommandSilently() gnuplot < /home/smrtanalysis/dario_test/canu_test/250k_assembly/correction/N22_test250k.gkpStore/readlengths.gp \ /dev/null 2>&1 ERROR: Failed with signal HUP (1)
Thanks,
Dario
On 02/03/2016 04:00 PM, Sergey Koren wrote:
I didn't see an error in your last comment. It submitted the job to the grid which is OK, the output will now be in canu.02.out and the job will progress in the background on the grid.
— Reply to this email directly or view it on GitHub https://github.com/marbl/canu/issues/40#issuecomment-179519476.
Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5
1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu
I just checked the nodes, 2 of them still have the older version, sorry for that.
In a file I saw that the scripts inside use a /fastq as an input: I am using a fasta, do you think this will cause a problem? This would be such a stupid error from me, I am confusing it with Falcon probably.
Dario
On 02/03/2016 04:00 PM, Sergey Koren wrote:
I didn't see an error in your last comment. It submitted the job to the grid which is OK, the output will now be in canu.02.out and the job will progress in the background on the grid.
— Reply to this email directly or view it on GitHub https://github.com/marbl/canu/issues/40#issuecomment-179519476.
Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5
1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu
fasta is no problem, we use it almost exclusively.
After setting the latest java on all nodes, I run it again and it went on for a while. After it ended, I found these lines in some output files that could be diagnostic:
canu.03.out:
bash: BASH_FUNC_module(): line 0: syntax error near unexpected token )' bash: BASH_FUNC_module(): line 0:
BASH_FUNC_module() () { eval
/usr/bin/modulecmd bash $*
'
bash: error importing function definition for `BASH_FUNC_module'
and canu.05out canu failed with 'failed to precompute mhap indices. Made 2 attempts, jobs still failed'.
The command line was this: /opt/canu/Linux-amd64/bin/canu -d /../250k_assembly -p N22_test250k genomeSize=380m corMinCoverage=2 gridEngineMemoryOption="-l h_vmem=MEMORY" gridEngineThreadsOption="-pe smp THREADS" gridOptions="-V -S /bin/bash" errorRate=0.18 -pacbio-raw input_subreads.fq
Thanks,
Dario
On 02/04/2016 07:31 AM, Sergey Koren wrote:
The E state jobs might just be your previous failed runs.I would erase them from the queue.
Canu parses the qhosts output to detects the machines in your cluster and picks job sizes that would enable it to run across the most machines given the resources available and the genome size you're assembling. Several steps are submitted as large array jobs in which case it could potentially consume a large part of your cluster, depending on how your scheduler works. The SGE scheduler might restrict the number of cores that a user/parallel environment can request. Generally, I prefer letting the cluster scheduler manage your jobs rather than trying to manage the job scheduling yourself. However, you can customize each step if you want, for example by using the -tc parameter to restricts the number of array jobs that can run in parallel at a time. For example, -tc 10 on a 100 job array would ensure only 10 jobs can be scheduled at a time, limiting the cores used by your job. You can see a list of grid options using:
|canu -options |grep gridOptions gridOptions Grid engine options applied to all jobs gridOptionsExecutive Grid engine options applied to the canu executive script gridOptionsJobName Grid jobs job-name suffix gridOptionsbat Grid engine options applied to unitig construction jobs gridOptionscns Grid engine options applied to unitig consensus jobs gridOptionscor Grid engine options applied to read correction jobs gridOptionscormhap Grid engine options applied to mhap overlaps for correction jobs gridOptionscorovl Grid engine options applied to overlaps for correction jobs gridOptionsmeryl Grid engine options applied to mer counting jobs gridOptionsobtmhap Grid engine options applied to mhap overlaps for trimming jobs gridOptionsobtovl Grid engine options applied to overlaps for trimming jobs gridOptionsoea Grid engine options applied to overlap error adjustment jobs gridOptionsovb Grid engine options applied to overlap store bucketizing jobs gridOptionsovs Grid engine options applied to overlap store sorting jobs gridOptionsred Grid engine options applied to read error detection jobs gridOptionsutgmhap Grid engine options applied to mhap overlaps for unitig construction jobs gridOptionsutgovl Grid engine options applied to overlaps for unitig construction jobs |
You can add the tc parameter to all the options except gridOptionsExecutive and gridOptions as those should all be array jobs. Those options should also all be additive in that your tc option will be included along with whatever canu picks for the parameters.
— Reply to this email directly or view it on GitHub https://github.com/marbl/canu/issues/40#issuecomment-179872010.
Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5
1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu
The little bit of googling on 'BASH_FUNC_module' hints this is outside canu. See for example: https://groups.google.com/forum/#!topic/genome-au-cluster-help/J1fKmk8XB1Q
Are there interesting messages in the precompute logs ($asm/correction/1-overlapper/).
At some point, remove the whole assembly directory and start over. There is probably lots of crud in there from all the restarts, and it'll be easier to figure out what's breaking without the junk.
I always remove the old folder when starting a new job. We are under bash: $ which bash /bin/bash $ bash -version GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
In the 1-overlapper I found these:
$ less 3.out
bash: module: line 1: syntax error: unexpected end of file
bash: error importing function definition for BASH_FUNC_module' Dumping reads from 39001 to 58500 (inclusive). Starting mhap precompute. Error occurred during initialization of VM Could not reserve enough space for 13631488KB object heap mv: cannot stat
/home/smrtanalysis/dario_test/canu_test/250k_assembly/correction/1-overlapper/blocks/000003.dat':
No such file or directory
Mhap failed.
Dumping reads from 39001 to 58500 (inclusive).
We are working on the bash issue. Thanks,
Dario
On 02/04/2016 04:03 PM, brianwalenz wrote:
The little bit of googling on 'BASH_FUNC_module' hints this is outside canu. See for example: https://groups.google.com/forum/#!topic/genome-au-cluster-help/J1fKmk8XB1Q https://groups.google.com/forum/#%21topic/genome-au-cluster-help/J1fKmk8XB1Q
Are there interesting messages in the precompute logs ($asm/correction/1-overlapper/).
At some point, remove the whole assembly directory and start over. There is probably lots of crud in there from all the restarts, and it'll be easier to figure out what's breaking without the junk.
— Reply to this email directly or view it on GitHub https://github.com/marbl/canu/issues/40#issuecomment-180094609.
Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5
1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu
The error indicates your JVM failed to initialize while allocating 13GB of ram (13631488KB). This would indicate most likely more than one job is being scheduled on each of your machines (which it shouldn't be since the qsub command is asking for 12 cores and 13GB of ram so that would mean the smp parameter is being ignored) or there are other processes on your machine that are taking up the available memory. This could be an issue with h_vmem which is usually not consumable meaning it schedules based on current memory usage not peak usage. I mentioned this issue in a comment above, the problem is if a job runs and requests 30GB with h_vmem and just after it starts canu submits a job requesting 13GB it would get scheduled on the same machine since the 30GB is not reserved and the process hasn't had time to reach its full allocation. Then as both run, the JVM tries to lock 13GB which is no longer free because the other process is up to 30GB.
You'll have to diagnose the state of the machine just before the JVM error to see what is taking memory and why the JVM can't run. This is an issue outside of Canu's control.
Thanks for the explanation, we will work on that now.
Dario
On 02/04/2016 04:53 PM, Sergey Koren wrote:
The error indicates your JVM failed to initialize while allocating 13GB of ram (13631488KB). This would indicate most likely more than one job is being scheduled on each of your machines (which it shouldn't be since the qsub command i asking for 12 cores and 13GB of ram) or there are other processes on your machine that are taking up the available memory. This could be an issue with h_vmem which is usually not consumable meaning it schedules based on current memory usage not peak usage. I mentioned this issue in a comment above, the problem is if a job runs and requests 30GB with h_vmem and just after it starts canu submits a job requesting 13GB it would get scheduled on the same machine since the 30GB is not reserved and the process hasn't had time to reach its full allocation. Then as both run, the JVM tries to lock 13GB which is no longer free because the other process is up to 30GB.
You'll have to diagnose the state of the machine just before the JVM error to see what is taking memory and why the JVM can't run. This is an issue outside of Canu's control.
— Reply to this email directly or view it on GitHub https://github.com/marbl/canu/issues/40#issuecomment-180109157.
Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5
1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu
Have you been able to resolve this issue? I'm closing for inactivity but if you need to, feel free to re-open.
Hello, not sure this is the right place, but it definitely fits this issue's topic. I'm trying to run Canu (v1.1) on SGE, which I do not know well, but it fails ('can't configure for SGE') with:
-- WARNING: Couldn't determine the SGE resource to request memory.
-- WARNING: No valid choices found! Find an appropriate complex name (qconf -sc) and set:
-- WARNING: gridEngineMemoryOption="-l <name>=MEMORY"
However none of the options returned by qconf -sc | memory
are consumables... Does this mean I need to contact the admin and ask him to change the cluster config? Or is there a workaround within Canu?
Thanks
Hello, We are having issues on running Canu on our cluster, the error says: canu failed with 'can't configure for SGE'.
The command is: /opt/canu/Linux-amd64/bin/canu -d /home/../250k_assembly -p test250k genomeSize=380m corMinCoverage=2 errorRate=0.18 -pacbio-raw input_subreads.fa and we have a PSSC cluster with 4 nodes, each with 12 core and 32 GB Ram, so a total of 48 cores and 128GB Ram. The SGE is GE 6.2u5p3 and it has CentOS release 6.5 (Final).
Following the documentation, we tested all the 6x2 options at Grid Engine Configuration (here one example) 754 $global{"gridEngineThreadsOption"} ="-pe make THREADS"; 755 $global{"gridEngineMemoryOption"} ="-l h_vmem=MEMORY"; but always got the same error.
How do we set up the assembler to run on our cluster? Will we then be able to modulate the resources (cores, cpus, memory), so that we can have other processes running at the same time? Thanks