Closed psur9757 closed 5 years ago
The empty file and error (safeWrite()-- Write failure on ovFile::writeBuffer::sb: Input/output error
) are suggesting either you're out of disk space/quota or the file server crashed/rebooted/failed.
For testing, you can run these by hand with ./overlap.sh 32
and ./overlap.sh 103
.
When I run ./overlap.sh 32
, I get
Running job 32 based on command line options.
STRING_NUM_BITS 31
OFFSET_BITS 31
STRING_NUM_MASK 2147483647
OFFSET_MASK 2147483647
MAX_STRING_NUM 2147483647
Hash_Mask_Bits 23
Max_Hash_Strings 11297
Max_Hash_Data_Len 170647834
Max_Hash_Load 0.750000
Kmer Length 22
Min Overlap Length 500
Max Error Rate 0.150000
Min Kmer Matches 0
Num_PThreads 4
HASH_TABLE_SIZE 8388608
sizeof(Hash_Bucket_t) 216
hash table size: 1728 MB
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Failed with 'Aborted'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::97 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
overlapInCore/overlapInCore.C::527 in main()
(null)::0 in (null)()
(null)::0 in (null)()
./overlap.sh: line 775: 129090 Aborted (core dumped) $bin/overlapInCore -t 4 -k 22 -k ../0-mercounts/OSR_512_canu3.ms22.frequentMers.fasta --hashbits 23 --hashload 0.75 --maxerate 0.15 --minlength 500 $opt -o ./$job.ovb.WORKING -s ./$job.stats ../OSR_512_canu3.gkpStore
When I run ./overlap.sh 103
, I get
Running job 103 based on command line options.
STRING_NUM_BITS 31
OFFSET_BITS 31
STRING_NUM_MASK 2147483647
OFFSET_MASK 2147483647
MAX_STRING_NUM 2147483647
Hash_Mask_Bits 23
Max_Hash_Strings 11711
Max_Hash_Data_Len 170644032
Max_Hash_Load 0.750000
Kmer Length 22
Min Overlap Length 500
Max Error Rate 0.150000
Min Kmer Matches 0
Num_PThreads 4
HASH_TABLE_SIZE 8388608
sizeof(Hash_Bucket_t) 216
hash table size: 1728 MB
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Failed with 'Aborted'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::97 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
overlapInCore/overlapInCore.C::527 in main()
(null)::0 in (null)()
(null)::0 in (null)()
./overlap.sh: line 775: 1342 Aborted (core dumped) $bin/overlapInCore -t 4 -k 22 -k ../0-mercounts/OSR_512_canu3.ms22.frequentMers.fasta --hashbits 23 --hashload 0.75 --maxerate 0.15 --minlength 500 $opt -o ./$job.ovb.WORKING -s ./$job.stats ../OSR_512_canu3.gkpStore
They look pretty identical to me.
That's not the same error as during the run on the grid, these are running out of memory and failing. I'd guess you're running on the head node and the system is killing your jobs. Have you tried getting an interactive session with at least 10gb of memory reserved and running the same overlap jobs by hand?
I tried interactive session with 16GB memory. qsub -I -P OSR -l select=1:ncpus=4:mem=16GB
There was an offset issue so I ran ./overlap.sh 31
and got
Running job 32 based on PBS_ARRAY_INDEX=1 and offset=31.
STRING_NUM_BITS 31
OFFSET_BITS 31
STRING_NUM_MASK 2147483647
OFFSET_MASK 2147483647
MAX_STRING_NUM 2147483647
Hash_Mask_Bits 23
Max_Hash_Strings 11297
Max_Hash_Data_Len 170647834
Max_Hash_Load 0.750000
Kmer Length 22
Min Overlap Length 500
Max Error Rate 0.150000
Min Kmer Matches 0
Num_PThreads 4
HASH_TABLE_SIZE 8388608
sizeof(Hash_Bucket_t) 216
hash table size: 1728 MB
check 32 MB
info 0 MB
start 0 MB
Initializing 4 work areas.
Build_Hash_Index from 459226 to 470522
Found 8242 reads with length 125324689 to load; 3055 skipped by being too short; 0 skipped per library restriction
String_Ct: 0/ 11297 totalLen: 11420/ 170647834 Hash_Entries: 11392/ 132120576 Load: 0.01%
HASH LOADING STOPPED: strings 11297 out of 11297 max.
HASH LOADING STOPPED: length 125324689 out of 170647834 max.
HASH LOADING STOPPED: entries 78824749 out of 132120576 max (load 44.75).
String_Ct = 11297 Extra_String_Ct = 0 Extra_String_Subcount = 95325
Read 35432 kmers to mark to skip
Range: 1-448823. Store has 1088846 reads.
Chunk: 14026 reads/thread -- (G.endRefID=448823 - G.bgnRefID=1) / G.Num_PThreads=4 / 8
Starting 1-448823 with 14026 per thread
Thread 00 processes reads 1-14026
Thread 01 processes reads 14027-28052
Thread 02 processes reads 28053-42078
Thread 03 processes reads 42079-56104
It timed out after 1 hour. Is there anything else I can do?
There is no error there, it was still running, is the default timeout on your interactive job 1hr? If so you can increase it and try again.
I finally finished running the overlap jobs 32 and 103. They took ~70 hours. Then I tried restarting Canu but that did not work. It ended immediately. Below are the relevant error files.
canu_qsub_error.txt
You will need to use the gridOptions param to specify the PBS project if using the PBS gridoption.
-- Canu 1.7
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
--
-- Read and contig alignments during correction, consensus and GFA building use:
-- Šošic M, Šikic M.
-- Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
-- Bioinformatics. 2017 May 1;33(9):1394-1395.
-- http://doi.org/10.1093/bioinformatics/btw753
--
-- Overlaps are generated using:
-- Berlin K, et al.
-- Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
-- Nat Biotechnol. 2015 Jun;33(6):623-30.
-- http://doi.org/10.1038/nbt.3238
--
-- Myers EW, et al.
-- A Whole-Genome Assembly of Drosophila.
-- Science. 2000 Mar 24;287(5461):2196-204.
-- http://doi.org/10.1126/science.287.5461.2196
--
-- Li H.
-- Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.
-- Bioinformatics. 2016 Jul 15;32(14):2103-10.
-- http://doi.org/10.1093/bioinformatics/btw152
--
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
-- Chin CS, et al.
-- Phased diploid genome assembly with single-molecule real-time sequencing.
-- Nat Methods. 2016 Dec;13(12):1050-1054.
-- http://doi.org/10.1038/nmeth.4035
--
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
-- Chin CS, et al.
-- Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
-- Nat Methods. 2013 Jun;10(6):563-9
-- http://doi.org/10.1038/nmeth.2474
--
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_151' (from 'java').
-- Detected gnuplot version '5.0 patchlevel 0' (from '/usr/local/gnuplot/5.0.0/bin/gnuplot') and image format 'svg'.
-- Detected 48 CPUs and 188 gigabytes of memory.
-- Detected PBSPro 'PBSPro_13.1.0.160576' with 'pbsnodes' binary in /usr/local/pbs/default/bin/pbsnodes.
-- Detecting PBSPro resources.
--
-- Found 2 hosts with 24 cores and 250 GB memory under PBSPro control.
-- Found 3 hosts with 64 cores and 6057 GB memory under PBSPro control.
-- Found 49 hosts with 48 cores and 185 GB memory under PBSPro control.
-- Found 27 hosts with 36 cores and 187 GB memory under PBSPro control.
-- Found 1 host with 1 core and 5 GB memory under PBSPro control.
-- Found 80 hosts with 32 cores and 123 GB memory under PBSPro control.
-- Found 2 hosts with 24 cores and 502 GB memory under PBSPro control.
-- Found 59 hosts with 24 cores and 123 GB memory under PBSPro control.
--
-- (tag)Threads
-- (tag)Memory |
-- (tag) | | algorithm
-- ------- ------ -------- -----------------------------
-- Grid: meryl 8 GB 4 CPUs (k-mer counting)
-- Grid: cormhap 13 GB 4 CPUs (overlap detection with mhap)
-- Grid: obtovl 8 GB 4 CPUs (overlap detection)
-- Grid: utgovl 8 GB 4 CPUs (overlap detection)
-- Grid: ovb 3 GB 1 CPU (overlap store bucketizer)
-- Grid: ovs 8 GB 1 CPU (overlap store sorting)
-- Grid: red 8 GB 4 CPUs (read error detection)
-- Grid: oea 4 GB 1 CPU (overlap error adjustment)
-- Grid: bat 64 GB 8 CPUs (contig construction)
-- Grid: gfa 8 GB 8 CPUs (GFA alignment and processing)
--
-- In 'OSR_512_canu3.gkpStore', found PacBio reads:
-- Raw: 1088846
-- Corrected: 802726
-- Trimmed: 779949
--
-- Generating assembly 'OSR_512_canu3' in '/scratch/RDS-FAE-OSR-RW/OSR_512_canu3'
--
-- Parameters:
--
-- genomeSize 86000000
--
-- Overlap Generation Limits:
-- corOvlErrorRate 0.2400 ( 24.00%)
-- obtOvlErrorRate 0.1500 ( 15.00%)
-- utgOvlErrorRate 0.1500 ( 15.00%)
--
-- Overlap Processing Limits:
-- corErrorRate 0.3000 ( 30.00%)
-- obtErrorRate 0.1500 ( 15.00%)
-- utgErrorRate 0.1500 ( 15.00%)
-- cnsErrorRate 0.1500 ( 15.00%)
--
--
-- BEGIN ASSEMBLY
--
-- Found 117 overlapInCore output files.
--
-- overlapInCore compute 'unitigging/1-overlapper':
-- kmer hits
-- with no overlap 38902589709 0766.74359 +- 176666745.253
-- with an overlap 267009927 6.12820513 +- 1220082.706
--
-- overlaps 267009927 6.12820513 +- 1220082.706
-- contained 115197265 .008547009 +- 528881.275
-- dovetail 151812662 4.11965812 +- 691743.093
--
-- overlaps rejected
-- multiple per pair 0 0 +- 0
-- bad short window 0 0 +- 0
-- bad long window 0 0 +- 0
----------------------------------------
-- Starting command on Tue Oct 2 11:59:57 2018 with 14846.663 GB free disk space
cd unitigging
/usr/local/canu/1.7/bin/ovStoreBuild \
-O ./OSR_512_canu3.ovlStore.BUILDING \
-G ./OSR_512_canu3.gkpStore \
-M 2-8 \
-L ./1-overlapper/ovljob.files \
> ./OSR_512_canu3.ovlStore.err 2>&1
sh: line 5: 282915 Aborted (core dumped) /usr/local/canu/1.7/bin/ovStoreBuild -O ./OSR_512_canu3.ovlStore.BUILDING -G ./OSR_512_canu3.gkpStor
e -M 2-8 -L ./1-overlapper/ovljob.files > ./OSR_512_canu3.ovlStore.err 2>&1
-- Finished on Tue Oct 2 12:02:25 2018 (148 seconds) with 14812.39 GB free disk space
----------------------------------------
ERROR:
ERROR: Failed with exit code 134. (rc=34304)
ERROR:
ABORT:
ABORT: Canu 1.7
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting. If that doesn't work, ask for help.
ABORT:
ABORT: failed to create the overlap store.
ABORT:
ABORT: Disk space available: 14812.39 GB
ABORT:
ABORT: Last 50 lines of the relevant log file (unitigging/OSR_512_canu3.ovlStore.err):
ABORT:
ABORT:
unitigging/OSR_512_canu3.ovlStore.err
Found 534019854 (534.02 million) overlaps.
Configuring for 2.00 GB to 8.00 GB memory and 16368 open files.
Will sort using 10 files; 58720256 (58.72 million) overlaps per bucket; 2.00 GB memory per bucket
bucket 1 has 53402015 olaps.
bucket 2 has 53402097 olaps.
bucket 3 has 53402129 olaps.
bucket 4 has 53401999 olaps.
bucket 5 has 53403593 olaps.
bucket 6 has 53402063 olaps.
bucket 7 has 53413263 olaps.
bucket 8 has 53404677 olaps.
bucket 9 has 53402063 olaps.
bucket 10 has 53385955 olaps.
Will sort 53.402 million overlaps per bucket, using 10 buckets 1.84 GB per bucket.
-- BUCKETIZING --
- Bucketizing '1-overlapper/001/000001.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0001'
- Bucketizing '1-overlapper/001/000002.ovb'
- Bucketizing '1-overlapper/001/000003.ovb'
- Bucketizing '1-overlapper/001/000004.ovb'
- Bucketizing '1-overlapper/001/000005.ovb'
- Bucketizing '1-overlapper/001/000006.ovb'
- Bucketizing '1-overlapper/001/000007.ovb'
- Bucketizing '1-overlapper/001/000008.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0002'
- Bucketizing '1-overlapper/001/000009.ovb'
- Bucketizing '1-overlapper/001/000010.ovb'
- Bucketizing '1-overlapper/001/000011.ovb'
- Bucketizing '1-overlapper/001/000012.ovb'
- Bucketizing '1-overlapper/001/000013.ovb'
- Bucketizing '1-overlapper/001/000014.ovb'
- Bucketizing '1-overlapper/001/000015.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0003'
- Bucketizing '1-overlapper/001/000016.ovb'
- Bucketizing '1-overlapper/001/000017.ovb'
- Bucketizing '1-overlapper/001/000018.ovb'
- Bucketizing '1-overlapper/001/000019.ovb'
- Bucketizing '1-overlapper/001/000020.ovb'
- Bucketizing '1-overlapper/001/000021.ovb'
- Bucketizing '1-overlapper/001/000022.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0004'
- Bucketizing '1-overlapper/001/000023.ovb'
- Bucketizing '1-overlapper/001/000024.ovb'
- Bucketizing '1-overlapper/001/000025.ovb'
- Bucketizing '1-overlapper/001/000026.ovb'
- Bucketizing '1-overlapper/001/000027.ovb'
- Bucketizing '1-overlapper/001/000028.ovb'
- Bucketizing '1-overlapper/001/000029.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0005'
- Bucketizing '1-overlapper/001/000030.ovb'
- Bucketizing '1-overlapper/001/000031.ovb'
- Bucketizing '1-overlapper/001/000032.ovb'
- Bucketizing '1-overlapper/001/000033.ovb'
- Bucketizing '1-overlapper/001/000034.ovb'
- Bucketizing '1-overlapper/001/000035.ovb'
- Bucketizing '1-overlapper/001/000036.ovb'
- Bucketizing '1-overlapper/001/000037.ovb'
- Bucketizing '1-overlapper/001/000038.ovb'
- Bucketizing '1-overlapper/001/000039.ovb'
- Bucketizing '1-overlapper/001/000040.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0006'
- Bucketizing '1-overlapper/001/000041.ovb'
- Bucketizing '1-overlapper/001/000042.ovb'
- Bucketizing '1-overlapper/001/000043.ovb'
- Bucketizing '1-overlapper/001/000044.ovb'
- Bucketizing '1-overlapper/001/000045.ovb'
- Bucketizing '1-overlapper/001/000046.ovb'
- Bucketizing '1-overlapper/001/000047.ovb'
- Bucketizing '1-overlapper/001/000048.ovb'
- Bucketizing '1-overlapper/001/000049.ovb'
- Bucketizing '1-overlapper/001/000050.ovb'
- Bucketizing '1-overlapper/001/000051.ovb'
- Bucketizing '1-overlapper/001/000052.ovb'
- Bucketizing '1-overlapper/001/000053.ovb'
- Bucketizing '1-overlapper/001/000054.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0007'
- Bucketizing '1-overlapper/001/000055.ovb'
- Bucketizing '1-overlapper/001/000056.ovb'
- Bucketizing '1-overlapper/001/000057.ovb'
- Bucketizing '1-overlapper/001/000058.ovb'
- Bucketizing '1-overlapper/001/000059.ovb'
- Bucketizing '1-overlapper/001/000060.ovb'
- Bucketizing '1-overlapper/001/000061.ovb'
- Bucketizing '1-overlapper/001/000062.ovb'
- Bucketizing '1-overlapper/001/000063.ovb'
- Bucketizing '1-overlapper/001/000064.ovb'
- Bucketizing '1-overlapper/001/000065.ovb'
- Bucketizing '1-overlapper/001/000066.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0008'
- Bucketizing '1-overlapper/001/000067.ovb'
- Bucketizing '1-overlapper/001/000068.ovb'
- Bucketizing '1-overlapper/001/000069.ovb'
- Bucketizing '1-overlapper/001/000070.ovb'
- Bucketizing '1-overlapper/001/000071.ovb'
- Bucketizing '1-overlapper/001/000072.ovb'
- Bucketizing '1-overlapper/001/000073.ovb'
- Bucketizing '1-overlapper/001/000074.ovb'
- Bucketizing '1-overlapper/001/000075.ovb'
- Bucketizing '1-overlapper/001/000076.ovb'
- Bucketizing '1-overlapper/001/000077.ovb'
- Bucketizing '1-overlapper/001/000078.ovb'
- Bucketizing '1-overlapper/001/000079.ovb'
- Bucketizing '1-overlapper/001/000080.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0009'
- Bucketizing '1-overlapper/001/000081.ovb'
- Bucketizing '1-overlapper/001/000082.ovb'
- Bucketizing '1-overlapper/001/000083.ovb'
- Bucketizing '1-overlapper/001/000084.ovb'
- Bucketizing '1-overlapper/001/000085.ovb'
- Bucketizing '1-overlapper/001/000086.ovb'
- Bucketizing '1-overlapper/001/000087.ovb'
- Bucketizing '1-overlapper/001/000088.ovb'
- Bucketizing '1-overlapper/001/000089.ovb'
- Bucketizing '1-overlapper/001/000090.ovb'
- Bucketizing '1-overlapper/001/000091.ovb'
- Bucketizing '1-overlapper/001/000092.ovb'
- Bucketizing '1-overlapper/001/000093.ovb'
- Bucketizing '1-overlapper/001/000094.ovb'
- Bucketizing '1-overlapper/001/000095.ovb'
- Bucketizing '1-overlapper/001/000096.ovb'
- Bucketizing '1-overlapper/001/000097.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0010'
- Bucketizing '1-overlapper/001/000098.ovb'
- Bucketizing '1-overlapper/001/000099.ovb'
- Bucketizing '1-overlapper/001/000100.ovb'
- Bucketizing '1-overlapper/001/000101.ovb'
- Bucketizing '1-overlapper/001/000102.ovb'
- Bucketizing '1-overlapper/001/000103.ovb'
- Bucketizing '1-overlapper/001/000104.ovb'
- Bucketizing '1-overlapper/001/000105.ovb'
- Bucketizing '1-overlapper/001/000106.ovb'
- Bucketizing '1-overlapper/001/000107.ovb'
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Failed with 'Aborted'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::97 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
stores/ovStoreFile.C::286 in _ZN6ovFile10readBufferEv()
stores/ovStoreFile.C::317 in _ZN6ovFile11readOverlapEP9ovOverlap()
stores/ovStoreBuild.C::519 in main()
(null)::0 in (null)()
(null)::0 in (null)()
This looks again like an out of memory issue, it could also be due to disk corruption of the data. How did you submit the canu command to your grid (memory/threads?)
I submit the following script with the command qsub canu.pbs
#!/bin/bash
#PBS -P RDS-FAE-OSR-RW
#PBS -N canu3
#PBS -l select=1:ncpus=8:mem=64GB
#PBS -l walltime=36:00:00
#PBS -e ./OSR_512_canu3_error.txt
#PBS -o ./OSR_512_canu3_output.txt
#PBS -M priyanka.surana@sydney.edu.au
#PBS -m b
module load canu/1.7
canu -p OSR_512_canu3 -d /scratch/RDS-FAE-OSR-RW/OSR_512_canu3 gnuplot="/usr/local/gnuplot/5.0.0/bin/gnuplot" genomeSize=86m corOutCoverage=200 correctedErro
rRate=0.15 gridOptions="-P RDS-FAE-OSR-RW" gridOptionsJobName=OSR512 corConcurrency=4 gridOptionsCOR="-l walltime=36:00:00" gridOptionsCORMHAP="-l walltime=3
6:00:00" gridOptionsCOROVL="-l walltime=36:00:00" gridOptionsOBTOVL="-l walltime=156:00:00 -l nodes=1:ppn=8" gridOptionsUTGOVL="-l walltime=156:00:00 -l node
s=1:ppn=8" -pacbio-raw /scratch/RDS-FAE-OSR-RW/raw_data/m54078_170626_*.subreads.fasta
module unload canu/1.7
Since you already requested 64gb of ram, I don't think memory is an issue but instead disk corruption. Since you ran out of disk space during this run, the FS may not have properly detected/reported this error so you ended up with corrupt files.
Can you post the contents of ls on the unitigging/1-overlapper/001
folder? My first guess for the failed files would be 107 (the last file in the log), you could try removing all files named 107 from the 001 folder, re-run overlap.sh 107 and re-launch Canu. It may fail again if there are more corrupt files so you'll need to see what file it failed on in the log and re-run that job.
An alternate would be to add ovlMerThreshold=500 to your command and re-run the full unitigging step (remove the unitigging folder from your assembly directory and re-launch Canu). This will both use less space and speed up the compute.
ls unitigging/1-overlapper/001/
000001.counts 000013.counts 000025.counts 000037.counts 000049.counts 000061.counts 000073.counts 000085.counts 000097.counts 000109.counts
000001.ovb 000013.ovb 000025.ovb 000037.ovb 000049.ovb 000061.ovb 000073.ovb 000085.ovb 000097.ovb 000109.ovb
000001.stats 000013.stats 000025.stats 000037.stats 000049.stats 000061.stats 000073.stats 000085.stats 000097.stats 000109.stats
000002.counts 000014.counts 000026.counts 000038.counts 000050.counts 000062.counts 000074.counts 000086.counts 000098.counts 000110.counts
000002.ovb 000014.ovb 000026.ovb 000038.ovb 000050.ovb 000062.ovb 000074.ovb 000086.ovb 000098.ovb 000110.ovb
000002.stats 000014.stats 000026.stats 000038.stats 000050.stats 000062.stats 000074.stats 000086.stats 000098.stats 000110.stats
000003.counts 000015.counts 000027.counts 000039.counts 000051.counts 000063.counts 000075.counts 000087.counts 000099.counts 000111.counts
000003.ovb 000015.ovb 000027.ovb 000039.ovb 000051.ovb 000063.ovb 000075.ovb 000087.ovb 000099.ovb 000111.ovb
000003.stats 000015.stats 000027.stats 000039.stats 000051.stats 000063.stats 000075.stats 000087.stats 000099.stats 000111.stats
000004.counts 000016.counts 000028.counts 000040.counts 000052.counts 000064.counts 000076.counts 000088.counts 000100.counts 000112.counts
000004.ovb 000016.ovb 000028.ovb 000040.ovb 000052.ovb 000064.ovb 000076.ovb 000088.ovb 000100.ovb 000112.ovb
000004.stats 000016.stats 000028.stats 000040.stats 000052.stats 000064.stats 000076.stats 000088.stats 000100.stats 000112.stats
000005.counts 000017.counts 000029.counts 000041.counts 000053.counts 000065.counts 000077.counts 000089.counts 000101.counts 000113.counts
000005.ovb 000017.ovb 000029.ovb 000041.ovb 000053.ovb 000065.ovb 000077.ovb 000089.ovb 000101.ovb 000113.ovb
000005.stats 000017.stats 000029.stats 000041.stats 000053.stats 000065.stats 000077.stats 000089.stats 000101.stats 000113.stats
000006.counts 000018.counts 000030.counts 000042.counts 000054.counts 000066.counts 000078.counts 000090.counts 000102.counts 000114.counts
000006.ovb 000018.ovb 000030.ovb 000042.ovb 000054.ovb 000066.ovb 000078.ovb 000090.ovb 000102.ovb 000114.ovb
000006.stats 000018.stats 000030.stats 000042.stats 000054.stats 000066.stats 000078.stats 000090.stats 000102.stats 000114.stats
000007.counts 000019.counts 000031.counts 000043.counts 000055.counts 000067.counts 000079.counts 000091.counts 000103.counts 000115.counts
000007.ovb 000019.ovb 000031.ovb 000043.ovb 000055.ovb 000067.ovb 000079.ovb 000091.ovb 000103.ovb 000115.ovb
000007.stats 000019.stats 000031.stats 000043.stats 000055.stats 000067.stats 000079.stats 000091.stats 000103.stats 000115.stats
000008.counts 000020.counts 000032.counts 000044.counts 000056.counts 000068.counts 000080.counts 000092.counts 000104.counts 000116.counts
000008.ovb 000020.ovb 000032.ovb 000044.ovb 000056.ovb 000068.ovb 000080.ovb 000092.ovb 000104.ovb 000116.ovb
000008.stats 000020.stats 000032.stats 000044.stats 000056.stats 000068.stats 000080.stats 000092.stats 000104.stats 000116.stats
000009.counts 000021.counts 000033.counts 000045.counts 000057.counts 000069.counts 000081.counts 000093.counts 000105.counts 000117.counts
000009.ovb 000021.ovb 000033.ovb 000045.ovb 000057.ovb 000069.ovb 000081.ovb 000093.ovb 000105.ovb 000117.ovb
000009.stats 000021.stats 000033.stats 000045.stats 000057.stats 000069.stats 000081.stats 000093.stats 000105.stats 000117.stats
000010.counts 000022.counts 000034.counts 000046.counts 000058.counts 000070.counts 000082.counts 000094.counts 000106.counts
000010.ovb 000022.ovb 000034.ovb 000046.ovb 000058.ovb 000070.ovb 000082.ovb 000094.ovb 000106.ovb
000010.stats 000022.stats 000034.stats 000046.stats 000058.stats 000070.stats 000082.stats 000094.stats 000106.stats
000011.counts 000023.counts 000035.counts 000047.counts 000059.counts 000071.counts 000083.counts 000095.counts 000107.counts
000011.ovb 000023.ovb 000035.ovb 000047.ovb 000059.ovb 000071.ovb 000083.ovb 000095.ovb 000107.ovb
000011.stats 000023.stats 000035.stats 000047.stats 000059.stats 000071.stats 000083.stats 000095.stats 000107.stats
000012.counts 000024.counts 000036.counts 000048.counts 000060.counts 000072.counts 000084.counts 000096.counts 000108.counts
000012.ovb 000024.ovb 000036.ovb 000048.ovb 000060.ovb 000072.ovb 000084.ovb 000096.ovb 000108.ovb
000012.stats 000024.stats 000036.stats 000048.stats 000060.stats 000072.stats 000084.stats 000096.stats 000108.stats
I am currently re-running overlap.sh 107
, if that doesn't work then I will delete the unitigging folder and restart Canu with updated parameters as you suggested. Thank you.
After running overlap.sh 107
, I restarted the canu command and got a different error this time. Below are the relevant error files. Should I still delete the unitigging folder, set ovlMerThreshold=500 and relaunch Canu, as you suggested earlier?
canu_qsub_error.txt
You will need to use the gridOptions param to specify the PBS project if using the PBS gridoption.
-- Canu 1.7
--
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_151' (from 'java').
-- Detected gnuplot version '5.0 patchlevel 0' (from '/usr/local/gnuplot/5.0.0/bin/gnuplot') and image format 'svg'.
-- Detected 48 CPUs and 188 gigabytes of memory.
-- Detected PBSPro 'PBSPro_13.1.0.160576' with 'pbsnodes' binary in /usr/local/pbs/default/bin/pbsnodes.
-- Detecting PBSPro resources.
--
-- Found 2 hosts with 24 cores and 250 GB memory under PBSPro control.
-- Found 3 hosts with 64 cores and 6057 GB memory under PBSPro control.
-- Found 49 hosts with 48 cores and 185 GB memory under PBSPro control.
-- Found 27 hosts with 36 cores and 187 GB memory under PBSPro control.
-- Found 1 host with 1 core and 5 GB memory under PBSPro control.
-- Found 80 hosts with 32 cores and 123 GB memory under PBSPro control.
-- Found 2 hosts with 24 cores and 502 GB memory under PBSPro control.
-- Found 59 hosts with 24 cores and 123 GB memory under PBSPro control.
--
-- (tag)Threads
-- (tag)Memory |
-- (tag) | | algorithm
-- ------- ------ -------- -----------------------------
-- Grid: meryl 8 GB 4 CPUs (k-mer counting)
-- Grid: cormhap 13 GB 4 CPUs (overlap detection with mhap)
-- Grid: obtovl 8 GB 4 CPUs (overlap detection)
-- Grid: utgovl 8 GB 4 CPUs (overlap detection)
-- Grid: ovb 3 GB 1 CPU (overlap store bucketizer)
-- Grid: ovs 8 GB 1 CPU (overlap store sorting)
-- Grid: red 8 GB 4 CPUs (read error detection)
-- Grid: oea 4 GB 1 CPU (overlap error adjustment)
-- Grid: bat 64 GB 8 CPUs (contig construction)
-- Grid: gfa 8 GB 8 CPUs (GFA alignment and processing)
--
-- In 'OSR_512_canu3.gkpStore', found PacBio reads:
-- Raw: 1088846
-- Corrected: 802726
-- Trimmed: 779949
--
-- Generating assembly 'OSR_512_canu3' in '/scratch/RDS-FAE-OSR-RW/OSR_512_canu3'
--
-- Parameters:
--
-- genomeSize 86000000
--
-- Overlap Generation Limits:
-- corOvlErrorRate 0.2400 ( 24.00%)
-- obtOvlErrorRate 0.1500 ( 15.00%)
-- utgOvlErrorRate 0.1500 ( 15.00%)
--
-- Overlap Processing Limits:
-- corErrorRate 0.3000 ( 30.00%)
-- obtErrorRate 0.1500 ( 15.00%)
-- utgErrorRate 0.1500 ( 15.00%)
-- cnsErrorRate 0.1500 ( 15.00%)
--
--
-- BEGIN ASSEMBLY
--
----------------------------------------
-- Starting command on Mon Oct 8 06:58:24 2018 with 8776.642 GB free disk space
cd unitigging
/usr/local/canu/1.7/bin/ovStoreBuild \
-O ./OSR_512_canu3.ovlStore.BUILDING \
-G ./OSR_512_canu3.gkpStore \
-M 2-8 \
-L ./1-overlapper/ovljob.files \
> ./OSR_512_canu3.ovlStore.err 2>&1
-- Finished on Mon Oct 8 06:58:31 2018 (7 seconds) with 8776.574 GB free disk space
----------------------------------------
ERROR:
ERROR: Failed with exit code 1. (rc=256)
ERROR:
ABORT:
ABORT: Canu 1.7
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting. If that doesn't work, ask for help.
ABORT:
ABORT: failed to create the overlap store.
ABORT:
ABORT: Disk space available: 8776.574 GB
ABORT:
ABORT: Last 50 lines of the relevant log file (unitigging/OSR_512_canu3.ovlStore.err):
ABORT:
ABORT:
unitigging/OSR_512_canu3.ovlStore.err
Found 534019854 (534.02 million) overlaps.
Configuring for 2.00 GB to 8.00 GB memory and 16368 open files.
Will sort using 10 files; 58720256 (58.72 million) overlaps per bucket; 2.00 GB memory per bucket
bucket 1 has 53402015 olaps.
bucket 2 has 53402097 olaps.
bucket 3 has 53402129 olaps.
bucket 4 has 53401999 olaps.
bucket 5 has 53403593 olaps.
bucket 6 has 53402063 olaps.
bucket 7 has 53413263 olaps.
bucket 8 has 53404677 olaps.
bucket 9 has 53402063 olaps.
bucket 10 has 53385955 olaps.
Will sort 53.402 million overlaps per bucket, using 10 buckets 1.84 GB per bucket.
-- BUCKETIZING --
ERROR: './OSR_512_canu3.ovlStore.BUILDING' is a valid ovStore; cannot create a new one.
Remove the OSR_512_canu3.ovlStore.BUILDING folder before restarting canu to let it regenerate from scratch.
I fixed all the job files that had error, deleted the ovlStore.BUILDING folder and restarted Canu. I got a new error. It says not enough memory, but I am giving it 64GB of memory. Do I have to rerun job 98? I already ran through all the jobs till 115 and fixed errors.
canu_qsub_error.txt
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_151' (from 'java').
-- Detected gnuplot version '5.0 patchlevel 0' (from '/usr/local/gnuplot/5.0.0/bin/gnuplot') and image format 'svg'.
-- Detected 48 CPUs and 188 gigabytes of memory.
-- Detected PBSPro 'PBSPro_13.1.0.160576' with 'pbsnodes' binary in /usr/local/pbs/default/bin/pbsnodes.
-- Detecting PBSPro resources.
--
-- Found 49 hosts with 48 cores and 185 GB memory under PBSPro control.
-- Found 27 hosts with 36 cores and 187 GB memory under PBSPro control.
-- Found 80 hosts with 32 cores and 123 GB memory under PBSPro control.
-- Found 2 hosts with 24 cores and 502 GB memory under PBSPro control.
-- Found 59 hosts with 24 cores and 123 GB memory under PBSPro control.
-- Found 2 hosts with 24 cores and 250 GB memory under PBSPro control.
-- Found 3 hosts with 64 cores and 6057 GB memory under PBSPro control.
--
-- (tag)Threads
-- (tag)Memory |
-- (tag) | | algorithm
-- ------- ------ -------- -----------------------------
-- Grid: meryl 8 GB 4 CPUs (k-mer counting)
-- Grid: cormhap 13 GB 4 CPUs (overlap detection with mhap)
-- Grid: obtovl 8 GB 4 CPUs (overlap detection)
-- Grid: utgovl 8 GB 4 CPUs (overlap detection)
-- Grid: ovb 3 GB 1 CPU (overlap store bucketizer)
-- Grid: ovs 8 GB 1 CPU (overlap store sorting)
-- Grid: red 8 GB 4 CPUs (read error detection)
-- Grid: oea 4 GB 1 CPU (overlap error adjustment)
-- Grid: bat 64 GB 8 CPUs (contig construction)
-- Grid: gfa 8 GB 8 CPUs (GFA alignment and processing)
--
-- In 'OSR_512_canu3.gkpStore', found PacBio reads:
-- Raw: 1088846
-- Corrected: 802726
-- Trimmed: 779949
--
-- Generating assembly 'OSR_512_canu3' in '/scratch/RDS-FAE-OSR-RW/OSR_512_canu3'
--
-- Parameters:
--
-- genomeSize 86000000
--
-- Overlap Generation Limits:
-- corOvlErrorRate 0.2400 ( 24.00%)
-- obtOvlErrorRate 0.1500 ( 15.00%)
-- utgOvlErrorRate 0.1500 ( 15.00%)
--
-- Overlap Processing Limits:
-- corErrorRate 0.3000 ( 30.00%)
-- obtErrorRate 0.1500 ( 15.00%)
-- utgErrorRate 0.1500 ( 15.00%)
-- cnsErrorRate 0.1500 ( 15.00%)
--
--
-- BEGIN ASSEMBLY
--
----------------------------------------
-- Starting command on Fri Oct 26 00:45:41 2018 with 282.761 GB free disk space
cd unitigging
/usr/local/canu/1.7/bin/ovStoreBuild \
-O ./OSR_512_canu3.ovlStore.BUILDING \
-G ./OSR_512_canu3.gkpStore \
-M 2-8 \
-L ./1-overlapper/ovljob.files \
> ./OSR_512_canu3.ovlStore.err 2>&1
sh: line 5: 294777 Aborted (core dumped) /usr/local/canu/1.7/bin/ovStoreBuild -O ./OSR_512_canu3.ovlStore.BUILDING -G ./OSR_512_canu3.gkpStor
e -M 2-8 -L ./1-overlapper/ovljob.files > ./OSR_512_canu3.ovlStore.err 2>&1
-- Finished on Fri Oct 26 00:47:56 2018 (135 seconds) with 263.502 GB free disk space
----------------------------------------
ERROR:
ERROR: Failed with exit code 134. (rc=34304)
ERROR:
ABORT:
ABORT: Canu 1.7
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting. If that doesn't work, ask for help.
ABORT:
ABORT: failed to create the overlap store.
ABORT:
ABORT: Disk space available: 263.502 GB
ABORT:
ABORT: Last 50 lines of the relevant log file (unitigging/OSR_512_canu3.ovlStore.err):
ABORT:
ABORT:
Last 50 lines of unitigging/OSR_512_canu3.ovlStore.err
- Bucketizing '1-overlapper/001/000070.ovb'
- Bucketizing '1-overlapper/001/000071.ovb'
- Bucketizing '1-overlapper/001/000072.ovb'
- Bucketizing '1-overlapper/001/000073.ovb'
- Bucketizing '1-overlapper/001/000074.ovb'
- Bucketizing '1-overlapper/001/000075.ovb'
- Bucketizing '1-overlapper/001/000076.ovb'
- Bucketizing '1-overlapper/001/000077.ovb'
- Bucketizing '1-overlapper/001/000078.ovb'
- Bucketizing '1-overlapper/001/000079.ovb'
- Bucketizing '1-overlapper/001/000080.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0009'
- Bucketizing '1-overlapper/001/000081.ovb'
- Bucketizing '1-overlapper/001/000082.ovb'
- Bucketizing '1-overlapper/001/000083.ovb'
- Bucketizing '1-overlapper/001/000084.ovb'
- Bucketizing '1-overlapper/001/000085.ovb'
- Bucketizing '1-overlapper/001/000086.ovb'
- Bucketizing '1-overlapper/001/000087.ovb'
- Bucketizing '1-overlapper/001/000088.ovb'
- Bucketizing '1-overlapper/001/000089.ovb'
- Bucketizing '1-overlapper/001/000090.ovb'
- Bucketizing '1-overlapper/001/000091.ovb'
- Bucketizing '1-overlapper/001/000092.ovb'
- Bucketizing '1-overlapper/001/000093.ovb'
- Bucketizing '1-overlapper/001/000094.ovb'
- Bucketizing '1-overlapper/001/000095.ovb'
- Bucketizing '1-overlapper/001/000096.ovb'
- Bucketizing '1-overlapper/001/000097.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0010'
- Bucketizing '1-overlapper/001/000098.ovb'
safeWrite()-- Write failure on ovFile::writeBuffer::sb: No space left on device
safeWrite()-- Wanted to write 849100 objects (size=1), wrote 369504.
ovStoreBuild: AS_UTL/AS_UTL_fileIO.C:107: void AS_UTL_safeWrite(FILE*, const void*, const char*, size_t, size_t): Assertion `(*__errno_location ()) == 0' failed.
Failed with 'Aborted'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::97 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
AS_UTL/AS_UTL_fileIO.C::107 in _Z16AS_UTL_safeWriteP8_IO_FILEPKvPKcmm()
stores/ovStoreFile.C::182 in _ZN6ovFile11writeBufferEb()
stores/ovStoreFile.C::162 in _ZN6ovFile11writeBufferEb()
stores/ovStoreFile.C::202 in _ZN6ovFile12writeOverlapEP9ovOverlap()
stores/ovStoreBuild.C::338 in writeToDumpFile()
stores/ovStoreBuild.C::532 in main()
(null)::0 in (null)()
(null)::0 in (null)()
canu.pbs
#!/bin/bash
#PBS -P RDS-FAE-OSR-RW
#PBS -N canu3
#PBS -l select=1:ncpus=8:mem=64GB
#PBS -l walltime=36:00:00
#PBS -e ./OSR_512_canu3_error.txt
#PBS -o ./OSR_512_canu3_output.txt
#PBS -M priyanka.surana@sydney.edu.au
#PBS -m b
module load canu/1.7
canu -p OSR_512_canu3 -d /scratch/RDS-FAE-OSR-RW/OSR_512_canu3 gnuplot="/usr/local/gnuplot/5.0.0/bin/gnuplot" genomeSize=86m corOutCoverage=200 correctedErrorRate=0.15 gridOptions="-P RDS-FAE-OSR-RW" gridOptionsJobName=OSR512 corConcurrency=4 gridOptionsCOR="-l walltime=36:00:00" gridOptionsCORMHAP="-l walltime=36:00:00" gridOptionsCOROVL="-l walltime=36:00:00" gridOptionsOBTOVL="-l walltime=156:00:00 -l nodes=1:ppn=8" gridOptionsUTGOVL="-l walltime=156:00:00 -l nodes=1:ppn=8" -pacbio-raw /scratch/RDS-FAE-OSR-RW/raw_data/m54078_170626_*.subreads.fasta
module unload canu/1.7
safeWrite()-- Write failure on ovFile::writeBuffer::sb: No space left on device
It would appear you've hit some kind of quota limit on disk usage, since the device also seems to have ~250 GB free space.
My Canu run completed. I gave the options for a smashed ploidy genome but I got separate contig and unitig files. I was expecting one smashed genome. How are these different from the ones with the default parameters?
I followed this approach:
Smash haplotypes together and then do phasing using another approach (like HapCUT2 or whatshap or others). In that case you want to do the opposite, increase the error rates used for finding overlaps: corOutCoverage=200 correctedErrorRate=0.15
My canu script - canu.pbs
#!/bin/bash
#PBS -P RDS-FAE-OSR-RW
#PBS -N canu3
#PBS -l select=1:ncpus=8:mem=64GB
#PBS -l walltime=36:00:00
#PBS -e ./OSR_512_canu3_error.txt
#PBS -o ./OSR_512_canu3_output.txt
#PBS -M priyanka.surana@sydney.edu.au
#PBS -m abe
module load canu/1.7
canu -p OSR_512_canu3 -d /scratch/RDS-FAE-OSR-RW/OSR_512_canu3 gnuplot="/usr/local/gnuplot/5.0.0/bin/gnuplot" genomeSize=86m corOutCoverage=200 correctedErrorRate=0.15 gridOptions="-P RDS-FAE-OSR-RW" gridOptionsJobName=OSR512 corConcurrency=4 gridOptionsCOR="-l walltime=36:00:00" gridOptionsCORMHAP="-l walltime=36:00:00" gridOptionsCOROVL="-l walltime=36:00:00" gridOptionsOBTOVL="-l walltime=156:00:00 -l nodes=1:ppn=8" gridOptionsUTGOVL="-l walltime=156:00:00 -l nodes=1:ppn=8" gridOptionsOEA="-l walltime=16:00:00 -l nodes=1:ppn=8" oeaMemory=64 -pacbio-raw /scratch/RDS-FAE-OSR-RW/raw_data/m54078_170626_*.subreads.fasta
module unload canu/1.7
The contigs and unitigs represent the same genome. See the FAQ for more details on the outputs: https://canu.readthedocs.io/en/latest/tutorial.html#outputs. For most applications you want to use the contigs. Closing since your run completed and the original issue is resolved.
I am trying to run the squashed option in Canu. And when I get to the unitigging step, I get errors in 2 overlap jobs.
Canu script for job submission in PBSPro
Canu.out
The output file for job 032 (
utgovl_OSR_512_canu3_OSR512.o2472584.32
) was blank. The output file for job 103 (utgovl_OSR_512_canu3_OSR512.o2472584.103
) is below.