Closed KarimAI7 closed 2 years ago
As I mentioned in #2111, most likely when MHAP jobs fail it's a JVM issue. Post the log from one of the failed jobs (correction/1-overlapper/mhap.*.out)
Here is the log:
`Found perl:
/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/perl/5.30.2/bin/perl
This is perl 5, version 30, subversion 2 (v5.30.2) built for x86_64-linux-thread-multi
Found java:
/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/java/14.0.2/bin/java
Picked up JAVA_TOOL_OPTIONS: -Xmx2g
Found canu:
/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/canu/2.2/bin/canu
canu 2.2
Running job 53 based on SLURM_ARRAY_TASK_ID=53 and offset=0.
Fetch blocks/000027.dat
Fetch blocks/000028.dat
Fetch blocks/000029.dat
Fetch blocks/000030.dat
Fetch blocks/000031.dat
Fetch blocks/000032.dat
Fetch blocks/000033.dat
Fetch blocks/000034.dat
Fetch blocks/000035.dat
Fetch blocks/000036.dat
Fetch blocks/000037.dat
Running block 000015 in query 000053
Picked up JAVA_TOOL_OPTIONS: -Xmx2g
Running with these settings:
--filter-threshold = 1.0E-7
--help = false
--max-shift = 0.2
--min-olap-length = 500
--min-store-length = 0
--no-rc = false
--no-self = true
--no-tf = false
--num-hashes = 512
--num-min-matches = 3
--num-threads = 16
--ordered-kmer-size = 14
--ordered-sketch-size = 1000
--repeat-idf-scale = 10.0
--repeat-weight = 0.9
--settings = 0
--store-full-id = true
--supress-noise = 0
--threshold = 0.78
--version = false
-f =
-h = false
-k = 16
-p =
-q = queries/000053
-s = ./blocks/000015.dat
Processing files for storage in reverse index...
Current # sequences loaded and processed from file: 5000...
Current # sequences loaded and processed from file: 10000...
Current # sequences loaded and processed from file: 15000...
Current # sequences loaded and processed from file: 20000...
Current # sequences loaded and processed from file: 25000...
Current # sequences loaded and processed from file: 30000...
Current # sequences loaded and processed from file: 35000...
Current # sequences loaded and processed from file: 40000...
Current # sequences loaded and processed from file: 45000...
Current # sequences loaded and processed from file: 50000...
Current # sequences loaded and processed from file: 55000...
Current # sequences loaded and processed from file: 60000...
Current # sequences loaded and processed from file: 65000...
Current # sequences loaded and processed from file: 70000...
Current # sequences stored: 5000...
Current # sequences stored: 10000...
Current # sequences stored: 15000...
Current # sequences stored: 20000...
Current # sequences stored: 25000...
Current # sequences stored: 30000...
Current # sequences stored: 35000...
Current # sequences stored: 40000...
Current # sequences stored: 45000...
Current # sequences stored: 50000...
Current # sequences stored: 55000...
Current # sequences stored: 60000...
Current # sequences stored: 65000...
Current # sequences stored: 70000...
Stored 70200 sequences in the index.
Processed 70200 unique sequences (fwd and rev).
Time (s) to read and hash from file: 13.204459519
Opened fasta file /lustre06/project/6058390/kitani/T_cruzi_data/file_all/canu_assembly/correction/1-overlapper/blocks/000027.dat.
Current # sequences loaded and processed from file: 5000...
Current # sequences loaded and processed from file: 10000...
slurmstepd: error: *** JOB 4495565 ON nc11023 CANCELLED AT 2022-04-10T10:37:36 DUE TO TIME LIMIT ***
At the bottom it says "due to time limit" which is weird since I had an hour to spare on the salloc command on ComputeCanada. Could it be that I initially ran it with less time, then when time ran out I re-ran it with additional time? I assumed the files would pick up from where they left off since that is what I read about Canu?
The time you specify for the initial job doesn't get inherited by any of the running jobs. As the FAQ says, you need to explicitly specify time/partitions using gridOptions. Something like gridOptions="--time 36:00:00"
.
I saw your comments from #1331
I ended up running the following:
canu -p myassembly -d canu_assembly genomeSize=55m -nanopore *.fastq useGrid=true gridOptions="--time 8:00:00"
Did the mhap step run with that option?
Mhap failed its first attempt, and is trying again:
Canu.out:
--
-- OVERLAPPER (mhap) (correction) complete, not rewriting scripts.
--
--
-- Mhap overlap jobs failed, retry.
-- job correction/1-overlapper/results/000001.ovb FAILED.
-- job correction/1-overlapper/results/000002.ovb FAILED.
-- job correction/1-overlapper/results/000003.ovb FAILED.
-- job correction/1-overlapper/results/000004.ovb FAILED.
-- job correction/1-overlapper/results/000005.ovb FAILED.
-- job correction/1-overlapper/results/000006.ovb FAILED.
-- job correction/1-overlapper/results/000007.ovb FAILED.
-- job correction/1-overlapper/results/000008.ovb FAILED.
-- job correction/1-overlapper/results/000009.ovb FAILED.
-- job correction/1-overlapper/results/000010.ovb FAILED.
-- job correction/1-overlapper/results/000011.ovb FAILED.
-- job correction/1-overlapper/results/000012.ovb FAILED.
-- job correction/1-overlapper/results/000013.ovb FAILED.
-- job correction/1-overlapper/results/000014.ovb FAILED.
-- job correction/1-overlapper/results/000015.ovb FAILED.
-- job correction/1-overlapper/results/000016.ovb FAILED.
-- job correction/1-overlapper/results/000017.ovb FAILED.
-- job correction/1-overlapper/results/000018.ovb FAILED.
-- job correction/1-overlapper/results/000019.ovb FAILED.
-- job correction/1-overlapper/results/000020.ovb FAILED.
-- job correction/1-overlapper/results/000021.ovb FAILED.
-- job correction/1-overlapper/results/000022.ovb FAILED.
-- job correction/1-overlapper/results/000023.ovb FAILED.
-- job correction/1-overlapper/results/000024.ovb FAILED.
-- job correction/1-overlapper/results/000025.ovb FAILED.
-- job correction/1-overlapper/results/000026.ovb FAILED.
-- job correction/1-overlapper/results/000027.ovb FAILED.
-- job correction/1-overlapper/results/000028.ovb FAILED.
-- job correction/1-overlapper/results/000029.ovb FAILED.
-- job correction/1-overlapper/results/000030.ovb FAILED.
-- job correction/1-overlapper/results/000031.ovb FAILED.
-- job correction/1-overlapper/results/000033.ovb FAILED.
-- job correction/1-overlapper/results/000034.ovb FAILED.
-- job correction/1-overlapper/results/000035.ovb FAILED.
-- job correction/1-overlapper/results/000037.ovb FAILED.
-- job correction/1-overlapper/results/000038.ovb FAILED.
-- job correction/1-overlapper/results/000039.ovb FAILED.
-- job correction/1-overlapper/results/000040.ovb FAILED.
-- job correction/1-overlapper/results/000041.ovb FAILED.
-- job correction/1-overlapper/results/000042.ovb FAILED.
-- job correction/1-overlapper/results/000043.ovb FAILED.
-- job correction/1-overlapper/results/000044.ovb FAILED.
-- job correction/1-overlapper/results/000045.ovb FAILED.
-- job correction/1-overlapper/results/000046.ovb FAILED.
-- job correction/1-overlapper/results/000047.ovb FAILED.
-- job correction/1-overlapper/results/000048.ovb FAILED.
-- job correction/1-overlapper/results/000049.ovb FAILED.
-- job correction/1-overlapper/results/000050.ovb FAILED.
-- job correction/1-overlapper/results/000051.ovb FAILED.
-- job correction/1-overlapper/results/000052.ovb FAILED.
-- job correction/1-overlapper/results/000053.ovb FAILED.
-- job correction/1-overlapper/results/000054.ovb FAILED.
-- job correction/1-overlapper/results/000055.ovb FAILED.
-- job correction/1-overlapper/results/000056.ovb FAILED.
-- job correction/1-overlapper/results/000057.ovb FAILED.
-- job correction/1-overlapper/results/000058.ovb FAILED.
-- job correction/1-overlapper/results/000059.ovb FAILED.
-- job correction/1-overlapper/results/000060.ovb FAILED.
-- job correction/1-overlapper/results/000061.ovb FAILED.
-- job correction/1-overlapper/results/000062.ovb FAILED.
-- job correction/1-overlapper/results/000063.ovb FAILED.
-- job correction/1-overlapper/results/000064.ovb FAILED.
-- job correction/1-overlapper/results/000065.ovb FAILED.
-- job correction/1-overlapper/results/000067.ovb FAILED.
-- job correction/1-overlapper/results/000068.ovb FAILED.
-- job correction/1-overlapper/results/000070.ovb FAILED.
-- job correction/1-overlapper/results/000071.ovb FAILED.
-- job correction/1-overlapper/results/000072.ovb FAILED.
-- job correction/1-overlapper/results/000073.ovb FAILED.
-- job correction/1-overlapper/results/000074.ovb FAILED.
-- job correction/1-overlapper/results/000075.ovb FAILED.
-- job correction/1-overlapper/results/000076.ovb FAILED.
-- job correction/1-overlapper/results/000077.ovb FAILED.
-- job correction/1-overlapper/results/000078.ovb FAILED.
-- job correction/1-overlapper/results/000079.ovb FAILED.
-- job correction/1-overlapper/results/000080.ovb FAILED.
-- job correction/1-overlapper/results/000081.ovb FAILED.
-- job correction/1-overlapper/results/000082.ovb FAILED.
-- job correction/1-overlapper/results/000083.ovb FAILED.
-- job correction/1-overlapper/results/000084.ovb FAILED.
-- job correction/1-overlapper/results/000085.ovb FAILED.
-- job correction/1-overlapper/results/000086.ovb FAILED.
-- job correction/1-overlapper/results/000087.ovb FAILED.
-- job correction/1-overlapper/results/000088.ovb FAILED.
-- job correction/1-overlapper/results/000089.ovb FAILED.
-- job correction/1-overlapper/results/000090.ovb FAILED.
-- job correction/1-overlapper/results/000092.ovb FAILED.
-- job correction/1-overlapper/results/000093.ovb FAILED.
-- job correction/1-overlapper/results/000094.ovb FAILED.
-- job correction/1-overlapper/results/000095.ovb FAILED.
-- job correction/1-overlapper/results/000096.ovb FAILED.
-- job correction/1-overlapper/results/000097.ovb FAILED.
-- job correction/1-overlapper/results/000098.ovb FAILED.
-- job correction/1-overlapper/results/000099.ovb FAILED.
-- job correction/1-overlapper/results/000100.ovb FAILED.
-- job correction/1-overlapper/results/000101.ovb FAILED.
-- job correction/1-overlapper/results/000102.ovb FAILED.
--
--
-- Running jobs. Second attempt out of 2.
--
-- 'mhap.jobSubmit-01.sh' -> job 4582946 tasks 1-31.
-- 'mhap.jobSubmit-02.sh' -> job 4582947 tasks 33-35.
-- 'mhap.jobSubmit-03.sh' -> job 4582948 tasks 37-65.
-- 'mhap.jobSubmit-04.sh' -> job 4582949 tasks 67-68.
-- 'mhap.jobSubmit-05.sh' -> job 4582950 tasks 70-90.
-- 'mhap.jobSubmit-06.sh' -> job 4582951 tasks 92-102.
--
----------------------------------------
-- Starting command on Tue Apr 12 12:33:50 2022 with 0 GB free disk space
cd /lustre06/project/6058390/kitani/T_cruzi_data/file_all/canu_assembly
sbatch \
--depend=afterany:4582946:4582947:4582948:4582949:4582950:4582951 \
--cpus-per-task=1 \
--mem-per-cpu=5g \
--time 8:00:00 \
-D `pwd` \
-J 'canu_myassembly' \
-o canu-scripts/canu.06.out canu-scripts/canu.06.sh
Submitted batch job 4582952
-- Finished on Tue Apr 12 12:33:52 2022 (2 seconds) with 0 GB free disk space !!! WARNING !!!
I noticed it says 0GB free disk space, could that be it?
Yes, if you are out of space, the jobs will definitely fail. You should have something like disk write errors in the logs of the failed jobs. It seems most of the jobs failed which means you need quite a bit more space than you have now.
I also noticed in your previous log:
Found java:
/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/java/14.0.2/bin/java
Picked up JAVA_TOOL_OPTIONS: -Xmx2g
Your JVM is set to always use 2gb for the JVM. This is incorrect and should be disabled because canu knows how much memory it will need (in your case it's using 13gb/job) and this will cause it to be very slow or fail when insufficient memory is allocated by the JVM.
Here is a log file:
Running job 46 based on SLURM_ARRAY_TASK_ID=46 and offset=0.
Fetch blocks/000014.dat
Fetch blocks/000015.dat
Fetch blocks/000016.dat
Fetch blocks/000017.dat
Fetch blocks/000018.dat
Fetch blocks/000019.dat
Fetch blocks/000020.dat
Fetch blocks/000021.dat
Fetch blocks/000022.dat
Fetch blocks/000023.dat
Fetch blocks/000024.dat
Running block 000013 in query 000046
Picked up JAVA_TOOL_OPTIONS: -Xmx2g
Running with these settings:
--filter-threshold = 1.0E-7
--help = false
--max-shift = 0.2
--min-olap-length = 500
--min-store-length = 0
--no-rc = false
--no-self = false
--no-tf = false
--num-hashes = 512
--num-min-matches = 3
--num-threads = 16
--ordered-kmer-size = 14
--ordered-sketch-size = 1000
--repeat-idf-scale = 10.0
--repeat-weight = 0.9
--settings = 0
--store-full-id = true
--supress-noise = 0
--threshold = 0.78
--version = false
-f =
-h = false
-k = 16
-p =
-q = queries/000046
-s = ./blocks/000013.dat
Processing files for storage in reverse index...
Current # sequences loaded and processed from file: 5000...
Current # sequences loaded and processed from file: 10000...
Current # sequences loaded and processed from file: 15000...
Current # sequences loaded and processed from file: 20000...
Current # sequences loaded and processed from file: 25000...
Current # sequences loaded and processed from file: 30000...
Current # sequences loaded and processed from file: 35000...
Current # sequences loaded and processed from file: 40000...
Current # sequences loaded and processed from file: 45000...
Current # sequences loaded and processed from file: 50000...
Current # sequences loaded and processed from file: 55000...
Current # sequences loaded and processed from file: 60000...
Current # sequences loaded and processed from file: 65000...
Current # sequences loaded and processed from file: 70000...
Current # sequences stored: 5000...
Current # sequences stored: 10000...
Current # sequences stored: 15000...
Current # sequences stored: 20000...
Current # sequences stored: 25000...
Current # sequences stored: 30000...
Current # sequences stored: 35000...
Current # sequences stored: 40000...
Current # sequences stored: 45000...
Current # sequences stored: 50000...
Current # sequences stored: 55000...
Current # sequences stored: 60000...
Current # sequences stored: 65000...
Current # sequences stored: 70000...
Stored 70200 sequences in the index.
Processed 70200 unique sequences (fwd and rev).
Time (s) to read and hash from file: 12.878285513000002
Time (s) to score and output to self: 5732.143230646
Opened fasta file /lustre06/project/6058390/kitani/T_cruzi_data/file_all/canu_assembly/correction/1-overlapper/blocks/000014.dat.
Current # sequences loaded and processed from file: 5000...
Current # sequences loaded and processed from file: 10000...
Current # sequences loaded and processed from file: 15000...
writeToFile()-- After writing 270113 out of 451701 'ovFile::writeBuffer::sb' objects (1 bytes each): Disk quota exceeded
Current # sequences loaded and processed from file: 20000...
Current # sequences loaded and processed from file: 25000...
Current # sequences loaded and processed from file: 30000...
Current # sequences loaded and processed from file: 35000...
Processed 35100 to sequences.
Time (s) to score, hash to-file, and output: 10258.222837360001
Opened fasta file /lustre06/project/6058390/kitani/T_cruzi_data/file_all/canu_assembly/correction/1-overlapper/blocks/000015.dat.
Current # sequences loaded and processed from file: 5000...
Current # sequences loaded and processed from file: 10000...
Current # sequences loaded and processed from file: 15000...
Current # sequences loaded and processed from file: 20000...
Current # sequences loaded and processed from file: 25000...
Current # sequences loaded and processed from file: 30000...
Current # sequences loaded and processed from file: 35000...
Processed 35100 to sequences.
Time (s) to score, hash to-file, and output: 9337.448781627001
Opened fasta file /lustre06/project/6058390/kitani/T_cruzi_data/file_all/canu_assembly/correction/1-overlapper/blocks/000016.dat.
Current # sequences loaded and processed from file: 5000...
Current # sequences loaded and processed from file: 10000...
slurmstepd: error: *** JOB 4571535 ON nc31130 CANCELLED AT 2022-04-12T15:50:32 DUE TO TIME LIMIT ***
It says due to time limit again.
Also regarding JVM this is whats is picked up from Computecanada. I will try to change it.
There are both out of disk errors and timeout:
writeToFile()-- After writing 270113 out of 451701 'ovFile::writeBuffer::sb' objects (1 bytes each): Disk quota exceeded
It's also running very slowly, I expect because of the JVM memory. You can estimate how long this job would take at the current speed by seeing how many files are in 1-overlapper/queries/000046
. It finished 2.25 files before being killed so scale that up to estimate total time based on the files there.
Either way, you may as well kill this run and try to get more space. Remove this run folder and re-start from scratch when you do. If you can't disable the JVM option, you can also add mhapMemory=2
which will create more jobs and require you to restart from scratch in a new folder but would fit into the fixed JVM limitation.
I killed the run and tried running it on a much smaller file. I did not get any space issues but I still got: slurmstepd: error: *** JOB 4571535 ON nc31130 CANCELLED AT 2022-04-12T15:50:32 DUE TO TIME LIMIT ***
I am rerunning with the smaller file and increased gridOptions
time from 8 hours to 36. I also added the mhapMemory=2.
Is there a way to run canu on Computecanada without indicating a time? Like can I just run it till completion regardless of how long it takes?
I don't know what the computecanada grid allows. I would guess if it is slurm, it has to specify a runtime. You can increase it beyond 36 hours to whatever your max time limit for the partition is. You can also look at the FAQ for some options that can speed up this step: https://canu.readthedocs.io/en/latest/faq.html#my-assembly-is-running-out-of-space-is-too-slow
alright, I will cancel my current run and add those parameters. I noticed they said to add mhapMemory=60g
which is significantly more than the 2g my JVM is running. Should copy all the other parameters as is and keep mhapMemory=2g
?
Yes, keep the memory at 2g because the JVM on your system is hard-coded to that.
I tried running it with the above parameters. I checked on the run a while later and noticed it stopped on its own. Tried running again and the same happened.
I checked the canu. out file and here is what i got:
-- Detected Slurm with 'sinfo' binary in /opt/software/slurm/bin/sinfo.
-- Detected Slurm with task IDs up to 9999 allowed.
--
-- Slurm support detected. Resources available:
-- 33 hosts with 64 cores and 2008 GB memory.
-- 159 hosts with 48 cores and 497 GB memory.
-- 1109 hosts with 64 cores and 248 GB memory.
--
-- (tag)Threads
-- (tag)Memory |
-- (tag) | | algorithm
-- ------- ---------- ------
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '14.0.2' (from '/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/java/14.0.2/bin/java') without -d64 suppor>
-- Detected gnuplot version '5.2 patchlevel 8 ' (from 'gnuplot') and image format 'png'.
--
-- Detected 1 CPUs and 4096 gigabytes of memory on the local machine.
--
-- Detected Slurm with 'sinfo' binary in /opt/software/slurm/bin/sinfo.
-- Detected Slurm with task IDs up to 9999 allowed.
--
-- Slurm support detected. Resources available:
-- 33 hosts with 64 cores and 2008 GB memory.
-- 159 hosts with 48 cores and 497 GB memory.
-- 1109 hosts with 64 cores and 248 GB memory.
--
-- (tag)Threads
-- (tag)Memory |
-- (tag) | | algorithm
-- ------- ---------- -------- -----------------------------
-- Grid: meryl 12.000 GB 4 CPUs (k-mer counting)
-- Grid: hap 8.000 GB 4 CPUs (read-to-haplotype assignment)
|
-- (tag) | | algorithm
-- ------- ---------- -------- -----------------------------
-- Grid: meryl 12.000 GB 4 CPUs (k-mer counting)
-- Grid: hap 8.000 GB 4 CPUs (read-to-haplotype assignment)
-- Grid: cormhap 2.000 GB 16 CPUs (overlap detection with mhap)
-- Grid: obtovl 8.000 GB 8 CPUs (overlap detection)
-- Grid: utgovl 8.000 GB 8 CPUs (overlap detection)
-- Grid: cor -.--- GB 4 CPUs (read correction)
-- Grid: ovb 4.000 GB 1 CPU (overlap store bucketizer)
-- Grid: ovs 8.000 GB 1 CPU (overlap store sorting)
-- Grid: red 15.000 GB 4 CPUs (read error detection)
-- Grid: oea 8.000 GB 1 CPU (overlap error adjustment)
-- Grid: bat 64.000 GB 8 CPUs (contig construction with bogart)
-- Grid: cns -.--- GB 8 CPUs (consensus)
--
-- Found Nanopore reads in 'myassembly.seqStore':
-- Libraries:
-- Nanopore: 421
-- Reads:
-- Raw: 1480740698
--
--
-- Generating assembly 'myassembly' in '/lustre06/project/6058390/kitani/T_cruzi_data/file_n59-b6/canu_assembly':
-- genomeSize:
-- 55000000
--
-- Overlap Generation Limits:
-- corOvlErrorRate 0.3200 ( 32.00%)
-- obtOvlErrorRate 0.1200 ( 12.00%)
-- utgOvlErrorRate 0.1200 ( 12.00%)
--
-- Overlap Processing Limits:
-- corErrorRate 0.3000 ( 30.00%)
-- obtErrorRate 0.1200 ( 12.00%)
-- utgErrorRate 0.1200 ( 12.00%)
-- cnsErrorRate 0.2000 ( 20.00%)
--
-- Stages to run:
-- correct raw reads.
-- trim corrected reads.
-- assemble corrected and trimmed reads.
--
--
-- BEGIN CORRECTION
-- Meryl finished successfully. Kmer frequency histogram:
--
-- 16-mers Fraction
-- Occurrences NumMers Unique Total
-- 1- 1 0 0.0000 0.0000
-- 2- 2 44065704 ********************************************************************** 0.4595 0.0710
-- 3- 4 20511037 ******************************** 0.6073 0.1053
-- 5- 7 7609633 ************ 0.7103 0.1400
-- 8- 11 4933827 ******* 0.7676 0.1702
-- 12- 16 5030163 ******* 0.8152 0.2086
-- 17- 22 4907756 ******* 0.8662 0.2675
-- 23- 29 3541136 ***** 0.9146 0.3434
-- 30- 37 1864259 ** 0.9482 0.4126
-- 38- 46 879492 * 0.9656 0.4582
-- 47- 56 521535 0.9740 0.4858
-- 57- 67 366198 0.9792 0.5067
-- 68- 79 275288 0.9829 0.5245
-- 80- 92 212476 0.9856 0.5404
-- 93- 106 171813 0.9878 0.5549
-- 107- 121 137708 0.9896 0.5685
-- 122- 137 116397 0.9910 0.5810
-- 138- 154 98762 0.9922 0.5931
-- 155- 172 83303 0.9932 0.6046
-- 173- 191 70271 0.9940 0.6155
-- 192- 211 59596 0.9948 0.6257
-- 212- 232 50506 0.9954 0.6353
-- 233- 254 43728 0.9959 0.6443
-- 255- 277 37635 0.9964 0.6528
-- 278- 301 31990 0.9967 0.6608
-- 302- 326 27450 0.9971 0.6682
-- 327- 352 23901 0.9974 0.6751
-- 353- 379 20720 0.9976 0.6816
-- 380- 407 18340 0.9978 0.6877
-- 408- 436 16135 0.9980 0.6935
-- 437- 466 14365 0.9982 0.6990
-- 467- 497 13150 0.9983 0.7042
-- 498- 529 12120 0.9985 0.7093
-- 530- 562 10952 0.9986 0.7143
-- 563- 596 10143 0.9987 0.7191
-- 597- 631 8979 0.9988 0.7238
-- 632- 667 7881 0.9989 0.7282
-- 668- 704 6963 0.9990 0.7323
-- 705- 742 6373 0.9991 0.7362
-- 743- 781 5774 0.9991 0.7399
-- 782- 821 5501 0.9992 0.7434
--
--
-- 0 (max occurrences)
-- 1241074229 (total mers, non-unique)
-- 95901390 (distinct mers, non-unique)
-- 0 (unique mers)
-- Finished stage 'meryl-process', reset canuIteration.
--
-- Removing meryl database 'correction/0-mercounts/myassembly.ms16'.
--
-- OVERLAPPER (mhap) (correction)
--
-- Set corMhapSensitivity=high based on read coverage of 26.92.
--
-- PARAMETERS: hashes=768, minMatches=2, threshold=0.73
--
-- Given 1.8 GB, can fit 450 reads per block.
-- For 2474 blocks, set stride to 618 blocks.
-- Logging partitioning to 'correction/1-overlapper/partitioning.log'.
mkdir correction/1-overlapper/queries/000795: Disk quota exceeded at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/canu/2.2/bin/../lib/site_
It says disk quota exceeded so I think the issue is now from Computecanada?
I tried checking a log file under corrections by going to correction/1-overlapper/mhap.*.out
but in the 1-overlapper directory I only found: partitioning.log queries
I am not sure why the run is canceling on its own though.
It ran out of disk space, couldn't make a directory, and failed because of that.
I'm not at all familiar with Compute Canada, but docs at https://docs.computecanada.ca/wiki/Compute_Canada_Documentation, specifically https://docs.computecanada.ca/wiki/Scratch_purging_policy, hint there is a scratch space you could possibly use to generate the assembly, then copy the result back to your project space when it is done.
Hello,
I ran the file on scratch and canu seemed to run well but I go the following issue:
-- Running jobs. First attempt out of 2.
--
-- Failed to submit compute jobs. Delay 10 seconds and try again.
CRASH:
CRASH: canu 2.2
CRASH: Please panic, this is abnormal.
CRASH:
CRASH: Failed to submit compute jobs.
CRASH:
CRASH: Failed at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/canu/2.2/bin/../lib/site_perl/canu/Execution.pm line 1259.
CRASH: canu::Execution::submitOrRunParallelJob("myassembly", "ovS", "correction/myassembly.ovlStore.BUILDING", "scripts/2-sort", 1, 2, 3, 4, ...) called at >
CRASH: canu::OverlapStore::overlapStoreSorterCheck("correction", "myassembly", "cor", 157, 4181) called at /cvmfs/soft.computecanada.ca/easybuild/software/2>
CRASH: canu::OverlapStore::createOverlapStore("myassembly", "cor") called at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/canu/2.2/bin/can>
CRASH: main::overlap("myassembly", "cor") called at /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/canu/2.2/bin/canu line 1079
CRASH:
CRASH: Last 50 lines of the relevant log file (correction/myassembly.ovlStore.BUILDING/scripts/2-sort.jobSubmit-01.out):
CRASH:
CRASH: sbatch: error: AssocMaxSubmitJobLimit
CRASH: sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
CRASH:
I contacted ComputeCanada who informed me there is a limit of 1000 jobs on their cluster.
I am not sure how to adjust the parameters to accommodate this.
There's discussion of this in issue #1883.
Idle, original issue w/runtime and space limits resolved. The job limit workaround is described in linked issue.
Here is the canu.out info: