Closed ml3958 closed 7 years ago
This is the expected behavior. When Canu runs on the grid it submits itself to your grid and doesn't run any processes on the head node. It you have jobs in the queue (qstat
) then Canu is running.
I would also suggest leaving off h_vmem because mem_free is probably requested on a per-core basis (Canu will set MEMORY to be total / # threads) but h_vmem is per job not core so your jobs may not run properly with it.
Thanks for the fast response! I did check my job queue(qstat
), there's one job there shortly, but finished within seconds without any error.
I tried again with your suggestion that leaving off h_vmem. I still got no output except the two folders and the submitted job finished very quickly.
please see the message below
[lium14@phoenix2 nanopore]$ /ifs/home/lium14/tools/canu-1.6/*/bin/canu \
> -p oxk_loose \
> -d /ifs/data/blaserlab/menghan/OxfGenomes/OXK/nanopore/loose_assembly_canu \
> genomeSize=2.49m \
> -nanopore-raw /ifs/data/sequence/results/blaserlab/2017-06-12-nanopore/reads/reads.2D.fastq.gz \
> corMhapSensitivity=high corMinCoverage=0 \
> gridEngineMemoryOption="-l mem_free=MEMORY"
-- Canu 1.6
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
--
-- Read and contig alignments during correction, consensus and GFA building use:
-- Šošic M, Šikic M.
-- Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
-- Bioinformatics. 2017 May 1;33(9):1394-1395.
-- http://doi.org/10.1093/bioinformatics/btw753
--
-- Overlaps are generated using:
-- Berlin K, et al.
-- Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
-- Nat Biotechnol. 2015 Jun;33(6):623-30.
-- http://doi.org/10.1038/nbt.3238
--
-- Myers EW, et al.
-- A Whole-Genome Assembly of Drosophila.
-- Science. 2000 Mar 24;287(5461):2196-204.
-- http://doi.org/10.1126/science.287.5461.2196
--
-- Li H.
-- Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.
-- Bioinformatics. 2016 Jul 15;32(14):2103-10.
-- http://doi.org/10.1093/bioinformatics/btw152
--
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
-- Chin CS, et al.
-- Phased diploid genome assembly with single-molecule real-time sequencing.
-- Nat Methods. 2016 Dec;13(12):1050-1054.
-- http://doi.org/10.1038/nmeth.4035
--
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
-- Chin CS, et al.
-- Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
-- Nat Methods. 2013 Jun;10(6):563-9
-- http://doi.org/10.1038/nmeth.2474
--
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_141' (from '/usr/lib/jvm/java-1.8.0/bin/java').
-- Detected gnuplot version '4.2 patchlevel 6 ' (from 'gnuplot') and image format 'png'.
-- Detected 32 CPUs and 126 gigabytes of memory.
-- Detected Sun Grid Engine in '/cm/shared/apps/sge/2011.11p1/default'.
-- Detected Grid Engine environment 'threaded'.
-- User supplied Grid Engine consumable '-l mem_free=MEMORY'.
--
-- WARNING:
-- WARNING: Queue 'gpu1.q' has start mode set to 'posix_behavior' and shell set to '/bin/csh'.
-- WARNING:
-- WARNING: Some queues in your configuration will fail to start jobs correctly.
-- WARNING: Jobs will be submitted with option:
-- WARNING: gridOptions=-S /bin/sh
-- WARNING:
-- WARNING: If jobs fail to start, modify the above option to use a valid shell
-- WARNING: and supply it directly to canu.
-- WARNING:
--
-- Found 1 host with 64 cores and 1009 GB memory under Sun Grid Engine control.
-- Found 5 hosts with 32 cores and 125 GB memory under Sun Grid Engine control.
-- Found 1 host with 8 cores and 62 GB memory under Sun Grid Engine control.
-- Found 2 hosts with 48 cores and 755 GB memory under Sun Grid Engine control.
-- Found 63 hosts with 32 cores and 252 GB memory under Sun Grid Engine control.
--
-- (tag)Threads
-- (tag)Memory |
-- (tag) | | algorithm
-- ------- ------ -------- -----------------------------
-- Grid: meryl 8 GB 4 CPUs (k-mer counting)
-- Grid: cormhap 6 GB 8 CPUs (overlap detection with mhap)
-- Grid: obtovl 8 GB 8 CPUs (overlap detection)
-- Grid: utgovl 8 GB 8 CPUs (overlap detection)
-- Grid: cor 7 GB 2 CPUs (read correction)
-- Grid: ovb 3 GB 1 CPU (overlap store bucketizer)
-- Grid: ovs 8 GB 1 CPU (overlap store sorting)
-- Grid: red 2 GB 4 CPUs (read error detection)
-- Grid: oea 1 GB 1 CPU (overlap error adjustment)
-- Grid: bat 15 GB 4 CPUs (contig construction)
-- Grid: cns 15 GB 4 CPUs (consensus)
-- Grid: gfa 8 GB 4 CPUs (GFA alignment and processing)
--
-- Found Nanopore uncorrected reads in the input files.
--
-- Generating assembly 'oxk_loose' in '/ifs/data/blaserlab/menghan/OxfGenomes/OXK/nanopore/loose_assembly_canu'
--
-- Parameters:
--
-- genomeSize 2490000
--
-- Overlap Generation Limits:
-- corOvlErrorRate 0.3200 ( 32.00%)
-- obtOvlErrorRate 0.1440 ( 14.40%)
-- utgOvlErrorRate 0.1440 ( 14.40%)
--
-- Overlap Processing Limits:
-- corErrorRate 0.5000 ( 50.00%)
-- obtErrorRate 0.1440 ( 14.40%)
-- utgErrorRate 0.1440 ( 14.40%)
-- cnsErrorRate 0.1920 ( 19.20%)
----------------------------------------
-- Starting command on Fri Sep 15 11:35:18 2017 with 205218.223 GB free disk space
cd /ifs/data/blaserlab/menghan/OxfGenomes/OXK/nanopore/loose_assembly_canu
qsub \
-l mem_free=8g \
-pe threaded 1 \
-S /bin/sh \
-cwd \
-N 'canu_oxk_loose' \
-j y \
-o canu-scripts/canu.01.out canu-scripts/canu.01.sh
Your job 3550407 ("canu_oxk_loose") has been submitted
-- Finished on Fri Sep 15 11:35:18 2017 (lickety-split) with 205218.242 GB free disk space
----------------------------------------
After running the command, I checked the job queue and the job finished within seconds.
[lium14@phoenix2 nanopore]$
[lium14@phoenix2 nanopore]$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
3550407 0.00000 canu_oxk_l lium14 qw 09/15/2017 11:35:18 1
[lium14@phoenix2 nanopore]$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
3550407 0.00000 canu_oxk_l lium14 qw 09/15/2017 11:35:18 1
[lium14@phoenix2 nanopore]$ ls
canu_loose.sh canu_run.sh loose_assembly_canu nanopore_canu_old nanopore_loose_canu_old
[lium14@phoenix2 nanopore]$ qstat
[lium14@phoenix2 nanopore]$ qstat
[lium14@phoenix2 nanopore]$ qstat
[lium14@phoenix2 nanopore]$ cd loose_assembly_canu/
[lium14@phoenix2 loose_assembly_canu]$ ls
canu-logs canu-scripts
The job status qw indicates it is waiting to be scheduled by the system so it hasn't run yet. You would have a canu.out file in your run folder if it had been scheduled/started running. Is there a canu.out from your previous job, can you post its contents if it is there.
Also, avoid running the same command (e.g. same directory) while another job is still in the queue/running as the two will like collide and cause an error.
Thank you so much Sergey! I think I did run two jobs with same command and it might mess up the process.
This time I killed all jobs and started a new one. I put the following command in run.sh
file and then qsub run.sh
/ifs/home/lium14/tools/canu-1.6/*/bin/canu \
-p oxk \
-d /ifs/data/blaserlab/menghan/OxfGenomes/OXK/nanopore/assembly_canu_SGE \
genomeSize=2.49m \
-nanopore-raw /ifs/data/sequence/results/blaserlab/2017-06-12-nanopore/reads/reads.2D.fastq.gz \
gridEngineMemoryOption="-l mem_free=MEMORY"
I got bit further where Canu actually tried to do correction. But it terminate again before correction. I didn't have canu.out file in my folder this time(I had canu.out previously when I run with `useGrid=false
)
this is what I have in the directory:
[lium14@phoenix2 assembly_canu_SGE]$ ls
canu-logs canu-scripts correction correction.html correction.html.files oxk.report
And in the oxk.report I only have this
[CORRECTION/READS]
--
-- In gatekeeper store 'correction/oxk.gkpStore':
-- Found 31151 reads.
-- Found 110683679 bases (44.45 times coverage).
--
-- Read length histogram (one '*' equals 177.48 reads):
-- 0 999 0
-- 1000 1999 6661 *************************************
-- 2000 2999 5936 *********************************
-- 3000 3999 12424 **********************************************************************
-- 4000 4999 2030 ***********
-- 5000 5999 1259 *******
-- 6000 6999 857 ****
-- 7000 7999 573 ***
-- 8000 8999 371 **
-- 9000 9999 238 *
-- 10000 10999 177
-- 11000 11999 136
-- 12000 12999 84
-- 13000 13999 68
-- 14000 14999 59
-- 15000 15999 43
-- 16000 16999 53
-- 17000 17999 26
-- 18000 18999 28
-- 19000 19999 25
-- 20000 20999 18
-- 21000 21999 21
-- 22000 22999 15
-- 23000 23999 14
-- 24000 24999 9
-- 25000 25999 4
-- 26000 26999 4
-- 27000 27999 4
-- 28000 28999 3
-- 29000 29999 3
-- 30000 30999 2
-- 31000 31999 2
-- 32000 32999 0
-- 33000 33999 0
-- 34000 34999 0
-- 35000 35999 1
-- 36000 36999 0
-- 37000 37999 0
-- 38000 38999 1
-- 39000 39999 0
-- 40000 40999 0
-- 41000 41999 0
-- 42000 42999 1
-- 43000 43999 1
Can you confirm your compute nodes are allowed to submit jobs? Whats the output and contents of the canu-scripts folder (are there any *.out
files there)? What does qstat report?
Yes I think the compute nodes are allowed to submit jobs. I've qsub
other pbs scripts and it worked fine.
there's canu.01.out
file in the canu-scripts folder.
/cm/local/apps/sge/var/spool/node053/job_scripts/3550522: line 9:
/cm/shared/apps/sge/2011.11p1//common/settings.sh: No such file or directory
Ah then this is the same as issue #505 which is fixed in the tip. Unfortunately, there are a lot of other unrelated changes in the tip. Your could edit canu-1.6/Linux-amd64/bin/lib/canu/Execution.pm
to remove lines 649-651:
649 print F "if [ \"x\$SGE_ROOT\" != \"x\" ]; then \n" if (getGlobal("gridEngine") eq "SGE");
650 print F " . \$SGE_ROOT/\$SGE_CELL/common/settings.sh\n" if (getGlobal("gridEngine") eq "SGE");
651 print F "fi\n" if (getGlobal("gridEngine") eq "SGE");
and see if that fixes your error.
Thanks so much! It worked! Canu 1.6 proceed!
But now I have a new error regarding java version. The issue is: the computing node I'm using has java 1.6 by default. I always manually load java 1.8
module load java/1.8
.
However, when use SGE, canu submit multiple scripts automatically, how can I modify canu script so it knows to load java 1.8 before calling SGE?
Thanks!
It shouldn't need to load the module, just point it to the proper java binary (java=/full/path/to/java/binary
) or you can add -V to your grid options (which means preserve your current environment for the submitted job).
Hi I am using Canu 1.6 on a remote cluster with SGE resources. I have no problem running Canu without using SGE. The problem rises when I try to use SGE resource:
gridEngineMemoryOption="-l h_vmem=MEMORY -l mem_free=MEMORY"
. But the submitted job finished with in seconds, leaving only two folders in the output directory. It didn't really perform any correction, trimming or assembly.canu-scripts: canu.01.out canu.01.sh
-- Canu 1.6
-- CITATIONS
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. -- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. -- Genome Res. 2017 May;27(5):722-736. -- http://doi.org/10.1101/gr.215087.116 -- -- Read and contig alignments during correction, consensus and GFA building use: -- Šošic M, Šikic M. -- Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance. -- Bioinformatics. 2017 May 1;33(9):1394-1395. -- http://doi.org/10.1093/bioinformatics/btw753 -- -- Overlaps are generated using: -- Berlin K, et al. -- Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. -- Nat Biotechnol. 2015 Jun;33(6):623-30. -- http://doi.org/10.1038/nbt.3238 -- -- Myers EW, et al. -- A Whole-Genome Assembly of Drosophila. -- Science. 2000 Mar 24;287(5461):2196-204. -- http://doi.org/10.1126/science.287.5461.2196 -- -- Li H. -- Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. -- Bioinformatics. 2016 Jul 15;32(14):2103-10. -- http://doi.org/10.1093/bioinformatics/btw152 -- -- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense: -- Chin CS, et al. -- Phased diploid genome assembly with single-molecule real-time sequencing. -- Nat Methods. 2016 Dec;13(12):1050-1054. -- http://doi.org/10.1038/nmeth.4035 -- -- Contig consensus sequences are generated using an algorithm derived from pbdagcon: -- Chin CS, et al. -- Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. -- Nat Methods. 2013 Jun;10(6):563-9 -- http://doi.org/10.1038/nmeth.2474 -- -- CONFIGURE CANU
-- Detected Java(TM) Runtime Environment '1.8.0_141' (from '/usr/lib/jvm/java-1.8.0/bin/java'). -- Detected gnuplot version '4.2 patchlevel 6 ' (from 'gnuplot') and image format 'png'. -- Detected 32 CPUs and 126 gigabytes of memory. -- Detected Sun Grid Engine in '/cm/shared/apps/sge/2011.11p1/default'. -- Detected Grid Engine environment 'threaded'. -- User supplied Grid Engine consumable '-l h_vmem=MEMORY -l mem_free=MEMORY'.
-- WARNING: -- WARNING: Queue 'gpu1.q' has start mode set to 'posix_behavior' and shell set to '/bin/csh'. -- WARNING: -- WARNING: Some queues in your configuration will fail to start jobs correctly. -- WARNING: Jobs will be submitted with option: -- WARNING: gridOptions=-S /bin/sh -- WARNING: -- WARNING: If jobs fail to start, modify the above option to use a valid shell -- WARNING: and supply it directly to canu. -- WARNING: -- -- Found 1 host with 64 cores and 1009 GB memory under Sun Grid Engine control. -- Found 5 hosts with 32 cores and 125 GB memory under Sun Grid Engine control. -- Found 1 host with 8 cores and 62 GB memory under Sun Grid Engine control. -- Found 2 hosts with 48 cores and 755 GB memory under Sun Grid Engine control. -- Found 63 hosts with 32 cores and 252 GB memory under Sun Grid Engine control.
-- Grid: meryl 8 GB 4 CPUs (k-mer counting) -- Grid: cormhap 6 GB 8 CPUs (overlap detection with mhap) -- Grid: obtovl 8 GB 8 CPUs (overlap detection) -- Grid: utgovl 8 GB 8 CPUs (overlap detection) -- Grid: cor 7 GB 2 CPUs (read correction) -- Grid: ovb 3 GB 1 CPU (overlap store bucketizer) -- Grid: ovs 8 GB 1 CPU (overlap store sorting) -- Grid: red 2 GB 4 CPUs (read error detection) -- Grid: oea 1 GB 1 CPU (overlap error adjustment) -- Grid: bat 15 GB 4 CPUs (contig construction) -- Grid: cns 15 GB 4 CPUs (consensus) -- Grid: gfa 8 GB 4 CPUs (GFA alignment and processing)
-- Found Nanopore uncorrected reads in the input files.
-- Generating assembly 'oxk_loose' in '/ifs/data/blaserlab/menghan/OxfGenomes/OXK/nanopore_loose_canu'
-- Parameters:
-- genomeSize 2490000
-- Overlap Generation Limits: -- corOvlErrorRate 0.3200 ( 32.00%) -- obtOvlErrorRate 0.1440 ( 14.40%) -- utgOvlErrorRate 0.1440 ( 14.40%)
-- Overlap Processing Limits: -- corErrorRate 0.5000 ( 50.00%) -- obtErrorRate 0.1440 ( 14.40%) -- utgErrorRate 0.1440 ( 14.40%) -- cnsErrorRate 0.1920 ( 19.20%)
-- Starting command on Fri Sep 15 11:22:56 2017 with 205245.893 GB free disk space
Your job 3550345 ("canu_oxk_loose") has been submitted
-- Finished on Fri Sep 15 11:22:56 2017 (lickety-split) with 205245.893 GB free disk space