marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
658 stars 179 forks source link

Overlap jobs failed, tried 2 times, giving up. #1094

Closed psur9757 closed 5 years ago

psur9757 commented 6 years ago

I am trying to run the squashed option in Canu. And when I get to the unitigging step, I get errors in 2 overlap jobs.

Canu script for job submission in PBSPro

#!/bin/bash
#PBS -P RDS-FAE-OSR-RW
#PBS -N canu3
#PBS -l select=1:ncpus=8:mem=64GB
#PBS -l walltime=36:00:00
#PBS -e ./OSR_512_canu3_error.txt
#PBS -o ./OSR_512_canu3_output.txt
#PBS -M priyanka.surana@sydney.edu.au
#PBS -m b

module load canu/1.7

canu -p OSR_512_canu3 -d /scratch/RDS-FAE-OSR-RW/OSR_512_canu3 gnuplot="/usr/local/gnuplot/5.0.0/bin/gnuplot" genomeSize=86m corOutCoverage=200 correctedErro
rRate=0.15 gridOptions="-P RDS-FAE-OSR-RW" gridOptionsJobName=OSR512 corConcurrency=4 gridOptionsCOR="-l walltime=36:00:00" gridOptionsCORMHAP="-l walltime=3
6:00:00" gridOptionsCOROVL="-l walltime=36:00:00" gridOptionsOBTOVL="-l walltime=156:00:00 -l nodes=1:ppn=8" gridOptionsUTGOVL="-l walltime=156:00:00 -l node
s=1:ppn=8" -pacbio-raw /scratch/RDS-FAE-OSR-RW/raw_data/m54078_170626_*.subreads.fasta

module unload canu/1.7

Canu.out

-- Canu 1.7
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
-- 
-- Read and contig alignments during correction, consensus and GFA building use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
--   Li H.
--   Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.
--   Bioinformatics. 2016 Jul 15;32(14):2103-10.
--   http://doi.org/10.1093/bioinformatics/btw152
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_151' (from 'java').
-- Detected gnuplot version '5.0 patchlevel 0' (from '/usr/local/gnuplot/5.0.0/bin/gnuplot') and image format 'svg'.
-- Detected 24 CPUs and 505 gigabytes of memory.
-- Detected PBSPro 'PBSPro_13.1.0.160576' with 'pbsnodes' binary in /usr/local/pbs/default/bin/pbsnodes.
-- Detecting PBSPro resources.
-- 
-- Found   2 hosts with  24 cores and  250 GB memory under PBSPro control.
-- Found   3 hosts with  64 cores and 6057 GB memory under PBSPro control.
-- Found  49 hosts with  48 cores and  185 GB memory under PBSPro control.
-- Found  27 hosts with  36 cores and  187 GB memory under PBSPro control.
-- Found   1 host  with   1 core  and    5 GB memory under PBSPro control.
-- Found  80 hosts with  32 cores and  123 GB memory under PBSPro control.
-- Found   2 hosts with  24 cores and  502 GB memory under PBSPro control.
-- Found  59 hosts with  24 cores and  123 GB memory under PBSPro control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl      8 GB    4 CPUs  (k-mer counting)
-- Grid:  cormhap   13 GB    4 CPUs  (overlap detection with mhap)
-- Grid:  obtovl     8 GB    4 CPUs  (overlap detection)
-- Grid:  utgovl     8 GB    4 CPUs  (overlap detection)
-- Grid:  ovb        3 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8 GB    1 CPU   (overlap store sorting)
-- Grid:  red        8 GB    4 CPUs  (read error detection)
-- Grid:  oea        4 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       64 GB    8 CPUs  (contig construction)
-- Grid:  gfa        8 GB    8 CPUs  (GFA alignment and processing)
--
-- In 'OSR_512_canu3.gkpStore', found PacBio reads:
--   Raw:        1088846
--   Corrected:  802726
--   Trimmed:    779949
--
-- Generating assembly 'OSR_512_canu3' in '/scratch/RDS-FAE-OSR-RW/OSR_512_canu3'
--
-- Parameters:
--
--  genomeSize        86000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.1500 ( 15.00%)
--    utgOvlErrorRate 0.1500 ( 15.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.1500 ( 15.00%)
--    utgErrorRate    0.1500 ( 15.00%)
--    cnsErrorRate    0.1500 ( 15.00%)
--
--
-- BEGIN ASSEMBLY
--
--
-- Overlap jobs failed, tried 2 times, giving up.
--   job unitigging/1-overlapper/001/000032.ovb FAILED.
--   job unitigging/1-overlapper/001/000103.ovb FAILED.
--

ABORT:
ABORT: Canu 1.7
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

The output file for job 032 (utgovl_OSR_512_canu3_OSR512.o2472584.32) was blank. The output file for job 103 (utgovl_OSR_512_canu3_OSR512.o2472584.103) is below.

Running job 103 based on PBS_ARRAY_INDEX=103 and offset=0.

STRING_NUM_BITS       31
OFFSET_BITS           31
STRING_NUM_MASK       2147483647
OFFSET_MASK           2147483647
MAX_STRING_NUM        2147483647

Hash_Mask_Bits        23
Max_Hash_Strings      11711
Max_Hash_Data_Len     170644032
Max_Hash_Load         0.750000
Kmer Length           22
Min Overlap Length    500
Max Error Rate        0.150000
Min Kmer Matches      0

Num_PThreads          4

HASH_TABLE_SIZE         8388608
sizeof(Hash_Bucket_t)   216
hash table size:        1728 MB

check  32 MB
info   0 MB
start  0 MB

Initializing 4 work areas.
Build_Hash_Index from 1014456 to 1026166
Found 8525 reads with length 124101069 to load; 3186 skipped by being too short; 0 skipped per library restriction
String_Ct:           0/       11711  totalLen:        7092/   170644032  Hash_Entries:        7070/   132120576  Load: 0.00%
HASH LOADING STOPPED: strings         11711 out of        11711 max.
HASH LOADING STOPPED: length      124101069 out of    170644032 max.
HASH LOADING STOPPED: entries      78126596 out of    132120576 max (load 44.35).
String_Ct = 11711  Extra_String_Ct = 0  Extra_String_Subcount = 95325
Read 35432 kmers to mark to skip

Range: 1-448823.  Store has 1088846 reads.
Chunk: 14026 reads/thread -- (G.endRefID=448823 - G.bgnRefID=1) / G.Num_PThreads=4 / 8

Starting 1-448823 with 14026 per thread

Thread 00 processes reads 1-14026
Thread 01 processes reads 14027-28052
Thread 02 processes reads 28053-42078
Thread 03 processes reads 42079-56104
Thread 03 writes    reads 42079-56104 (14376 overlaps 79912/11227146/0 kmer hits with/without overlap/skipped)
Thread 03 processes reads 56105-70130
Thread 02 writes    reads 28053-42078 (16550 overlaps 82086/11422954/0 kmer hits with/without overlap/skipped)
Thread 02 processes reads 70131-84156
Thread 01 writes    reads 14027-28052 (13715 overlaps 79251/11203251/0 kmer hits with/without overlap/skipped)
Thread 01 processes reads 84157-98182
Thread 00 writes    reads 1-14026 (18844 overlaps 84380/11615429/0 kmer hits with/without overlap/skipped)
Thread 00 processes reads 98183-112208
Thread 03 writes    reads 56105-70130 (11114 overlaps 76650/10917566/0 kmer hits with/without overlap/skipped)
Thread 03 processes reads 112209-126234
Thread 02 writes    reads 70131-84156 (18454 overlaps 83990/11021208/0 kmer hits with/without overlap/skipped)
Thread 02 processes reads 126235-140260
Thread 01 writes    reads 84157-98182 (13321 overlaps 78857/11023647/0 kmer hits with/without overlap/skipped)
Thread 01 processes reads 140261-154286
Thread 00 writes    reads 98183-112208 (12134 overlaps 77670/10888455/0 kmer hits with/without overlap/skipped)
Thread 00 processes reads 154287-168312
Thread 03 writes    reads 112209-126234 (13461 overlaps 78997/11110359/0 kmer hits with/without overlap/skipped)
Thread 03 processes reads 168313-182338
Thread 01 writes    reads 140261-154286 (15030 overlaps 80566/11156633/0 kmer hits with/without overlap/skipped)
Thread 01 processes reads 182339-196364
Thread 02 writes    reads 126235-140260 (12916 overlaps 78452/11075186/0 kmer hits with/without overlap/skipped)
Thread 02 processes reads 196365-210390
Thread 00 writes    reads 154287-168312 (12313 overlaps 77849/11021815/0 kmer hits with/without overlap/skipped)
Thread 00 processes reads 210391-224416
Thread 03 writes    reads 168313-182338 (11403 overlaps 76939/11206827/0 kmer hits with/without overlap/skipped)
Thread 03 processes reads 224417-238442
Thread 01 writes    reads 182339-196364 (13500 overlaps 79036/10956420/0 kmer hits with/without overlap/skipped)
Thread 01 processes reads 238443-252468
Thread 02 writes    reads 196365-210390 (10486 overlaps 76022/10984347/0 kmer hits with/without overlap/skipped)
Thread 02 processes reads 252469-266494
Thread 00 writes    reads 210391-224416 (11962 overlaps 77498/11086231/0 kmer hits with/without overlap/skipped)
Thread 00 processes reads 266495-280520
Thread 03 writes    reads 224417-238442 (7039 overlaps 72575/11030144/0 kmer hits with/without overlap/skipped)
Thread 03 processes reads 280521-294546
Thread 01 writes    reads 238443-252468 (10779 overlaps 76315/10754486/0 kmer hits with/without overlap/skipped)
Thread 01 processes reads 294547-308572
Thread 02 writes    reads 252469-266494 (11843 overlaps 77379/10945923/0 kmer hits with/without overlap/skipped)
Thread 02 processes reads 308573-322598
Thread 00 writes    reads 266495-280520 (10477 overlaps 76013/11048975/0 kmer hits with/without overlap/skipped)
Thread 00 processes reads 322599-336624
Thread 03 writes    reads 280521-294546 (11580 overlaps 77116/11124026/0 kmer hits with/without overlap/skipped)
Thread 03 processes reads 336625-350650
Thread 02 writes    reads 308573-322598 (16973 overlaps 82509/11088258/0 kmer hits with/without overlap/skipped)
Thread 02 processes reads 350651-364676
Thread 01 writes    reads 294547-308572 (14058 overlaps 79594/11271158/0 kmer hits with/without overlap/skipped)
Thread 01 processes reads 364677-378702
Thread 00 writes    reads 322599-336624 (18604 overlaps 84140/11176259/0 kmer hits with/without overlap/skipped)
Thread 00 processes reads 378703-392728
Thread 03 writes    reads 336625-350650 (15707 overlaps 81243/10886950/0 kmer hits with/without overlap/skipped)
Thread 03 processes reads 392729-406754
Thread 01 writes    reads 364677-378702 (20079 overlaps 85615/10846713/0 kmer hits with/without overlap/skipped)
Thread 01 processes reads 406755-420780
Thread 02 writes    reads 350651-364676 (15789 overlaps 81325/10859344/0 kmer hits with/without overlap/skipped)
Thread 02 processes reads 420781-434806
Thread 00 writes    reads 378703-392728 (15057 overlaps 80593/11206047/0 kmer hits with/without overlap/skipped)
Thread 00 processes reads 434807-448823
Thread 03 writes    reads 392729-406754 (17410 overlaps 82946/11344673/0 kmer hits with/without overlap/skipped)
Thread 01 writes    reads 406755-420780 (18996 overlaps 84532/11254631/0 kmer hits with/without overlap/skipped)
Thread 02 writes    reads 420781-434806 (19693 overlaps 85229/11214500/0 kmer hits with/without overlap/skipped)
Thread 00 writes    reads 434807-448823 (16322 overlaps 81858/11580631/0 kmer hits with/without overlap/skipped)
Build_Hash_Index from 1026167 to 1030599
Found 3186 reads with length 46542963 to load; 1247 skipped by being too short; 0 skipped per library restriction
String_Ct:           0/       11711  totalLen:       27042/   170644032  Hash_Entries:       26419/   132120576  Load: 0.01%
HASH LOADING STOPPED: strings          4433 out of        11711 max.
HASH LOADING STOPPED: length       46542963 out of    170644032 max.
HASH LOADING STOPPED: entries      35330719 out of    132120576 max (load 20.06).
String_Ct = 4433  Extra_String_Ct = 0  Extra_String_Subcount = 95325
Read 35432 kmers to mark to skip

Range: 1-448823.  Store has 1088846 reads.
Chunk: 14026 reads/thread -- (G.endRefID=448823 - G.bgnRefID=1) / G.Num_PThreads=4 / 8

Starting 1-448823 with 14026 per thread

Thread 00 processes reads 1-14026
Thread 01 processes reads 14027-28052
Thread 03 processes reads 42079-56104
Thread 02 processes reads 28053-42078
Thread 03 writes    reads 42079-56104 (31980 overlaps 31980/4239576/0 kmer hits with/without overlap/skipped)
Thread 03 processes reads 56105-70130
Thread 02 writes    reads 28053-42078 (32200 overlaps 32200/4317851/0 kmer hits with/without overlap/skipped)
Thread 02 processes reads 70131-84156
Thread 01 writes    reads 14027-28052 (31338 overlaps 31338/4240967/0 kmer hits with/without overlap/skipped)
Thread 01 processes reads 84157-98182
Thread 00 writes    reads 1-14026 (509 overlaps 33277/4382765/0 kmer hits with/without overlap/skipped)
Thread 00 processes reads 98183-112208
Thread 03 writes    reads 56105-70130 (30120 overlaps 30120/4123407/0 kmer hits with/without overlap/skipped)
Thread 03 processes reads 112209-126234
Thread 02 writes    reads 70131-84156 (517 overlaps 33285/4164854/0 kmer hits with/without overlap/skipped)
Thread 02 processes reads 126235-140260
Thread 01 writes    reads 84157-98182 (30990 overlaps 30990/4163195/0 kmer hits with/without overlap/skipped)
Thread 01 processes reads 140261-154286
Thread 00 writes    reads 98183-112208 (30366 overlaps 30366/4117511/0 kmer hits with/without overlap/skipped)
Thread 00 processes reads 154287-168312
Thread 03 writes    reads 112209-126234 (31149 overlaps 31149/4198858/0 kmer hits with/without overlap/skipped)
Thread 03 processes reads 168313-182338
Thread 01 writes    reads 140261-154286 (31597 overlaps 31597/4214784/0 kmer hits with/without overlap/skipped)
Thread 01 processes reads 182339-196364
Thread 02 writes    reads 126235-140260 (31143 overlaps 31143/4184924/0 kmer hits with/without overlap/skipped)
Thread 02 processes reads 196365-210390
Thread 00 writes    reads 154287-168312 (30671 overlaps 30671/4163720/0 kmer hits with/without overlap/skipped)
Thread 00 processes reads 210391-224416
Thread 03 writes    reads 168313-182338 (30783 overlaps 30783/4236883/0 kmer hits with/without overlap/skipped)
Thread 03 processes reads 224417-238442
Thread 01 writes    reads 182339-196364 (31167 overlaps 31167/4140752/0 kmer hits with/without overlap/skipped)
Thread 01 processes reads 238443-252468
Thread 02 writes    reads 196365-210390 (30023 overlaps 30023/4147793/0 kmer hits with/without overlap/skipped)
Thread 02 processes reads 252469-266494
Thread 00 writes    reads 210391-224416 (30264 overlaps 30264/4187188/0 kmer hits with/without overlap/skipped)
Thread 00 processes reads 266495-280520
Thread 03 writes    reads 224417-238442 (28891 overlaps 28891/4170177/0 kmer hits with/without overlap/skipped)
Thread 03 processes reads 280521-294546
Thread 01 writes    reads 238443-252468 (29906 overlaps 29906/4060157/0 kmer hits with/without overlap/skipped)
Thread 01 processes reads 294547-308572
Thread 02 writes    reads 252469-266494 (30587 overlaps 30587/4139146/0 kmer hits with/without overlap/skipped)
Thread 02 processes reads 308573-322598
Thread 00 writes    reads 266495-280520 (29851 overlaps 29851/4176702/0 kmer hits with/without overlap/skipped)
Thread 00 processes reads 322599-336624
Thread 03 writes    reads 280521-294546 (30759 overlaps 30759/4208845/0 kmer hits with/without overlap/skipped)
safeWrite()-- Write failure on ovFile::writeBuffer::sb: Input/output error
safeWrite()-- Wanted to write 867812 objects (size=1), wrote 806285.
overlapInCore: AS_UTL/AS_UTL_fileIO.C:107: void AS_UTL_safeWrite(FILE*, const void*, const char*, size_t, size_t): Assertion `(*__errno_location ()) == 0' fa
iled.

Failed with 'Aborted'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::97 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
AS_UTL/AS_UTL_fileIO.C::107 in _Z16AS_UTL_safeWriteP8_IO_FILEPKvPKcmm()
stores/ovStoreFile.C::182 in _ZN6ovFile11writeBufferEb()
stores/ovStoreFile.C::162 in _ZN6ovFile11writeBufferEb()
stores/ovStoreFile.C::202 in _ZN6ovFile12writeOverlapEP9ovOverlap()
overlapInCore/overlapInCore-Process_Overlaps.C::148 in _Z16Process_OverlapsPv()
overlapInCore/overlapInCore.C::272 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
/var/spool/PBS/mom_priv/jobs/2472584[103].pbsserver.SC: line 775: 148634 Aborted                 (core dumped) $bin/overlapInCore -t 4 -k 22 -k ../0-mercount
s/OSR_512_canu3.ms22.frequentMers.fasta --hashbits 23 --hashload 0.75 --maxerate 0.15 --minlength 500 $opt -o ./$job.ovb.WORKING -s ./$job.stats ../OSR_512_c
anu3.gkpStore
brianwalenz commented 6 years ago

The empty file and error (safeWrite()-- Write failure on ovFile::writeBuffer::sb: Input/output error) are suggesting either you're out of disk space/quota or the file server crashed/rebooted/failed.

For testing, you can run these by hand with ./overlap.sh 32 and ./overlap.sh 103.

psur9757 commented 6 years ago

When I run ./overlap.sh 32, I get

Running job 32 based on command line options.

STRING_NUM_BITS       31
OFFSET_BITS           31
STRING_NUM_MASK       2147483647
OFFSET_MASK           2147483647
MAX_STRING_NUM        2147483647

Hash_Mask_Bits        23
Max_Hash_Strings      11297
Max_Hash_Data_Len     170647834
Max_Hash_Load         0.750000
Kmer Length           22
Min Overlap Length    500
Max Error Rate        0.150000
Min Kmer Matches      0

Num_PThreads          4

HASH_TABLE_SIZE         8388608
sizeof(Hash_Bucket_t)   216
hash table size:        1728 MB

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Failed with 'Aborted'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::97 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
overlapInCore/overlapInCore.C::527 in main()
(null)::0 in (null)()
(null)::0 in (null)()
./overlap.sh: line 775: 129090 Aborted                 (core dumped) $bin/overlapInCore -t 4 -k 22 -k ../0-mercounts/OSR_512_canu3.ms22.frequentMers.fasta --hashbits 23 --hashload 0.75 --maxerate 0.15 --minlength 500 $opt -o ./$job.ovb.WORKING -s ./$job.stats ../OSR_512_canu3.gkpStore

When I run ./overlap.sh 103, I get

Running job 103 based on command line options.

STRING_NUM_BITS       31
OFFSET_BITS           31
STRING_NUM_MASK       2147483647
OFFSET_MASK           2147483647
MAX_STRING_NUM        2147483647

Hash_Mask_Bits        23
Max_Hash_Strings      11711
Max_Hash_Data_Len     170644032
Max_Hash_Load         0.750000
Kmer Length           22
Min Overlap Length    500
Max Error Rate        0.150000
Min Kmer Matches      0

Num_PThreads          4

HASH_TABLE_SIZE         8388608
sizeof(Hash_Bucket_t)   216
hash table size:        1728 MB

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Failed with 'Aborted'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::97 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
overlapInCore/overlapInCore.C::527 in main()
(null)::0 in (null)()
(null)::0 in (null)()
./overlap.sh: line 775:  1342 Aborted                 (core dumped) $bin/overlapInCore -t 4 -k 22 -k ../0-mercounts/OSR_512_canu3.ms22.frequentMers.fasta --hashbits 23 --hashload 0.75 --maxerate 0.15 --minlength 500 $opt -o ./$job.ovb.WORKING -s ./$job.stats ../OSR_512_canu3.gkpStore

They look pretty identical to me.

skoren commented 6 years ago

That's not the same error as during the run on the grid, these are running out of memory and failing. I'd guess you're running on the head node and the system is killing your jobs. Have you tried getting an interactive session with at least 10gb of memory reserved and running the same overlap jobs by hand?

psur9757 commented 6 years ago

I tried interactive session with 16GB memory. qsub -I -P OSR -l select=1:ncpus=4:mem=16GB

There was an offset issue so I ran ./overlap.sh 31 and got

Running job 32 based on PBS_ARRAY_INDEX=1 and offset=31.

STRING_NUM_BITS       31
OFFSET_BITS           31
STRING_NUM_MASK       2147483647
OFFSET_MASK           2147483647
MAX_STRING_NUM        2147483647

Hash_Mask_Bits        23
Max_Hash_Strings      11297
Max_Hash_Data_Len     170647834
Max_Hash_Load         0.750000
Kmer Length           22
Min Overlap Length    500
Max Error Rate        0.150000
Min Kmer Matches      0

Num_PThreads          4

HASH_TABLE_SIZE         8388608
sizeof(Hash_Bucket_t)   216
hash table size:        1728 MB

check  32 MB
info   0 MB
start  0 MB

Initializing 4 work areas.
Build_Hash_Index from 459226 to 470522
Found 8242 reads with length 125324689 to load; 3055 skipped by being too short; 0 skipped per library restriction
String_Ct:           0/       11297  totalLen:       11420/   170647834  Hash_Entries:       11392/   132120576  Load: 0.01%
HASH LOADING STOPPED: strings         11297 out of        11297 max.
HASH LOADING STOPPED: length      125324689 out of    170647834 max.
HASH LOADING STOPPED: entries      78824749 out of    132120576 max (load 44.75).
String_Ct = 11297  Extra_String_Ct = 0  Extra_String_Subcount = 95325
Read 35432 kmers to mark to skip

Range: 1-448823.  Store has 1088846 reads.
Chunk: 14026 reads/thread -- (G.endRefID=448823 - G.bgnRefID=1) / G.Num_PThreads=4 / 8

Starting 1-448823 with 14026 per thread

Thread 00 processes reads 1-14026
Thread 01 processes reads 14027-28052
Thread 02 processes reads 28053-42078
Thread 03 processes reads 42079-56104

It timed out after 1 hour. Is there anything else I can do?

skoren commented 6 years ago

There is no error there, it was still running, is the default timeout on your interactive job 1hr? If so you can increase it and try again.

psur9757 commented 6 years ago

I finally finished running the overlap jobs 32 and 103. They took ~70 hours. Then I tried restarting Canu but that did not work. It ended immediately. Below are the relevant error files.

canu_qsub_error.txt

    You will need to use the gridOptions param to specify the PBS project if using the PBS gridoption.
-- Canu 1.7
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
-- 
-- Read and contig alignments during correction, consensus and GFA building use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
--   Li H.
--   Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.
--   Bioinformatics. 2016 Jul 15;32(14):2103-10.
--   http://doi.org/10.1093/bioinformatics/btw152
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_151' (from 'java').
-- Detected gnuplot version '5.0 patchlevel 0' (from '/usr/local/gnuplot/5.0.0/bin/gnuplot') and image format 'svg'.
-- Detected 48 CPUs and 188 gigabytes of memory.
-- Detected PBSPro 'PBSPro_13.1.0.160576' with 'pbsnodes' binary in /usr/local/pbs/default/bin/pbsnodes.
-- Detecting PBSPro resources.
-- 
-- Found   2 hosts with  24 cores and  250 GB memory under PBSPro control.
-- Found   3 hosts with  64 cores and 6057 GB memory under PBSPro control.
-- Found  49 hosts with  48 cores and  185 GB memory under PBSPro control.
-- Found  27 hosts with  36 cores and  187 GB memory under PBSPro control.
-- Found   1 host  with   1 core  and    5 GB memory under PBSPro control.
-- Found  80 hosts with  32 cores and  123 GB memory under PBSPro control.
-- Found   2 hosts with  24 cores and  502 GB memory under PBSPro control.
-- Found  59 hosts with  24 cores and  123 GB memory under PBSPro control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl      8 GB    4 CPUs  (k-mer counting)
-- Grid:  cormhap   13 GB    4 CPUs  (overlap detection with mhap)
-- Grid:  obtovl     8 GB    4 CPUs  (overlap detection)
-- Grid:  utgovl     8 GB    4 CPUs  (overlap detection)
-- Grid:  ovb        3 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8 GB    1 CPU   (overlap store sorting)
-- Grid:  red        8 GB    4 CPUs  (read error detection)
-- Grid:  oea        4 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       64 GB    8 CPUs  (contig construction)
-- Grid:  gfa        8 GB    8 CPUs  (GFA alignment and processing)
--
-- In 'OSR_512_canu3.gkpStore', found PacBio reads:
--   Raw:        1088846
--   Corrected:  802726
--   Trimmed:    779949
--
-- Generating assembly 'OSR_512_canu3' in '/scratch/RDS-FAE-OSR-RW/OSR_512_canu3'
--
-- Parameters:
--
--  genomeSize        86000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.1500 ( 15.00%)
--    utgOvlErrorRate 0.1500 ( 15.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.1500 ( 15.00%)
--    utgErrorRate    0.1500 ( 15.00%)
--    cnsErrorRate    0.1500 ( 15.00%)
--
--
-- BEGIN ASSEMBLY
--
-- Found 117 overlapInCore output files.
--
-- overlapInCore compute 'unitigging/1-overlapper':
--   kmer hits
--     with no overlap      38902589709  0766.74359 +- 176666745.253
--     with an overlap        267009927  6.12820513 +- 1220082.706
--
--   overlaps                 267009927  6.12820513 +- 1220082.706
--     contained              115197265  .008547009 +- 528881.275
--     dovetail               151812662  4.11965812 +- 691743.093
--
--   overlaps rejected
--     multiple per pair              0           0 +- 0
--     bad short window               0           0 +- 0
--     bad long window                0           0 +- 0
----------------------------------------
-- Starting command on Tue Oct  2 11:59:57 2018 with 14846.663 GB free disk space

    cd unitigging
    /usr/local/canu/1.7/bin/ovStoreBuild \
     -O ./OSR_512_canu3.ovlStore.BUILDING \
     -G ./OSR_512_canu3.gkpStore \
     -M 2-8 \
     -L ./1-overlapper/ovljob.files \
     > ./OSR_512_canu3.ovlStore.err 2>&1
sh: line 5: 282915 Aborted                 (core dumped) /usr/local/canu/1.7/bin/ovStoreBuild -O ./OSR_512_canu3.ovlStore.BUILDING -G ./OSR_512_canu3.gkpStor
e -M 2-8 -L ./1-overlapper/ovljob.files > ./OSR_512_canu3.ovlStore.err 2>&1

-- Finished on Tue Oct  2 12:02:25 2018 (148 seconds) with 14812.39 GB free disk space
----------------------------------------

ERROR:
ERROR:  Failed with exit code 134.  (rc=34304)
ERROR:

ABORT:
ABORT: Canu 1.7
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
ABORT:   failed to create the overlap store.
ABORT:
ABORT: Disk space available:  14812.39 GB
ABORT:
ABORT: Last 50 lines of the relevant log file (unitigging/OSR_512_canu3.ovlStore.err):
ABORT:
ABORT:

unitigging/OSR_512_canu3.ovlStore.err

Found 534019854 (534.02 million) overlaps.
Configuring for 2.00 GB to 8.00 GB memory and 16368 open files.
Will sort using 10 files; 58720256 (58.72 million) overlaps per bucket; 2.00 GB memory per bucket
  bucket    1 has 53402015 olaps.
  bucket    2 has 53402097 olaps.
  bucket    3 has 53402129 olaps.
  bucket    4 has 53401999 olaps.
  bucket    5 has 53403593 olaps.
  bucket    6 has 53402063 olaps.
  bucket    7 has 53413263 olaps.
  bucket    8 has 53404677 olaps.
  bucket    9 has 53402063 olaps.
  bucket   10 has 53385955 olaps.
Will sort 53.402 million overlaps per bucket, using 10 buckets 1.84 GB per bucket.

-- BUCKETIZING --

-  Bucketizing '1-overlapper/001/000001.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0001'
-  Bucketizing '1-overlapper/001/000002.ovb'
-  Bucketizing '1-overlapper/001/000003.ovb'
-  Bucketizing '1-overlapper/001/000004.ovb'
-  Bucketizing '1-overlapper/001/000005.ovb'
-  Bucketizing '1-overlapper/001/000006.ovb'
-  Bucketizing '1-overlapper/001/000007.ovb'
-  Bucketizing '1-overlapper/001/000008.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0002'
-  Bucketizing '1-overlapper/001/000009.ovb'
-  Bucketizing '1-overlapper/001/000010.ovb'
-  Bucketizing '1-overlapper/001/000011.ovb'
-  Bucketizing '1-overlapper/001/000012.ovb'
-  Bucketizing '1-overlapper/001/000013.ovb'
-  Bucketizing '1-overlapper/001/000014.ovb'
-  Bucketizing '1-overlapper/001/000015.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0003'
-  Bucketizing '1-overlapper/001/000016.ovb'
-  Bucketizing '1-overlapper/001/000017.ovb'
-  Bucketizing '1-overlapper/001/000018.ovb'
-  Bucketizing '1-overlapper/001/000019.ovb'
-  Bucketizing '1-overlapper/001/000020.ovb'
-  Bucketizing '1-overlapper/001/000021.ovb'
-  Bucketizing '1-overlapper/001/000022.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0004'
-  Bucketizing '1-overlapper/001/000023.ovb'
-  Bucketizing '1-overlapper/001/000024.ovb'
-  Bucketizing '1-overlapper/001/000025.ovb'
-  Bucketizing '1-overlapper/001/000026.ovb'
-  Bucketizing '1-overlapper/001/000027.ovb'
-  Bucketizing '1-overlapper/001/000028.ovb'
-  Bucketizing '1-overlapper/001/000029.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0005'
-  Bucketizing '1-overlapper/001/000030.ovb'
-  Bucketizing '1-overlapper/001/000031.ovb'
-  Bucketizing '1-overlapper/001/000032.ovb'
-  Bucketizing '1-overlapper/001/000033.ovb'
-  Bucketizing '1-overlapper/001/000034.ovb'
-  Bucketizing '1-overlapper/001/000035.ovb'
-  Bucketizing '1-overlapper/001/000036.ovb'
-  Bucketizing '1-overlapper/001/000037.ovb'
-  Bucketizing '1-overlapper/001/000038.ovb'
-  Bucketizing '1-overlapper/001/000039.ovb'
-  Bucketizing '1-overlapper/001/000040.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0006'
-  Bucketizing '1-overlapper/001/000041.ovb'
-  Bucketizing '1-overlapper/001/000042.ovb'
-  Bucketizing '1-overlapper/001/000043.ovb'
-  Bucketizing '1-overlapper/001/000044.ovb'
-  Bucketizing '1-overlapper/001/000045.ovb'
-  Bucketizing '1-overlapper/001/000046.ovb'
-  Bucketizing '1-overlapper/001/000047.ovb'
-  Bucketizing '1-overlapper/001/000048.ovb'
-  Bucketizing '1-overlapper/001/000049.ovb'
-  Bucketizing '1-overlapper/001/000050.ovb'
-  Bucketizing '1-overlapper/001/000051.ovb'
-  Bucketizing '1-overlapper/001/000052.ovb'
-  Bucketizing '1-overlapper/001/000053.ovb'
-  Bucketizing '1-overlapper/001/000054.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0007'
-  Bucketizing '1-overlapper/001/000055.ovb'
-  Bucketizing '1-overlapper/001/000056.ovb'
-  Bucketizing '1-overlapper/001/000057.ovb'
-  Bucketizing '1-overlapper/001/000058.ovb'
-  Bucketizing '1-overlapper/001/000059.ovb'
-  Bucketizing '1-overlapper/001/000060.ovb'
-  Bucketizing '1-overlapper/001/000061.ovb'
-  Bucketizing '1-overlapper/001/000062.ovb'
-  Bucketizing '1-overlapper/001/000063.ovb'
-  Bucketizing '1-overlapper/001/000064.ovb'
-  Bucketizing '1-overlapper/001/000065.ovb'
-  Bucketizing '1-overlapper/001/000066.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0008'
-  Bucketizing '1-overlapper/001/000067.ovb'
-  Bucketizing '1-overlapper/001/000068.ovb'
-  Bucketizing '1-overlapper/001/000069.ovb'
-  Bucketizing '1-overlapper/001/000070.ovb'
-  Bucketizing '1-overlapper/001/000071.ovb'
-  Bucketizing '1-overlapper/001/000072.ovb'
-  Bucketizing '1-overlapper/001/000073.ovb'
-  Bucketizing '1-overlapper/001/000074.ovb'
-  Bucketizing '1-overlapper/001/000075.ovb'
-  Bucketizing '1-overlapper/001/000076.ovb'
-  Bucketizing '1-overlapper/001/000077.ovb'
-  Bucketizing '1-overlapper/001/000078.ovb'
-  Bucketizing '1-overlapper/001/000079.ovb'
-  Bucketizing '1-overlapper/001/000080.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0009'
-  Bucketizing '1-overlapper/001/000081.ovb'
-  Bucketizing '1-overlapper/001/000082.ovb'
-  Bucketizing '1-overlapper/001/000083.ovb'
-  Bucketizing '1-overlapper/001/000084.ovb'
-  Bucketizing '1-overlapper/001/000085.ovb'
-  Bucketizing '1-overlapper/001/000086.ovb'
-  Bucketizing '1-overlapper/001/000087.ovb'
-  Bucketizing '1-overlapper/001/000088.ovb'
-  Bucketizing '1-overlapper/001/000089.ovb'
-  Bucketizing '1-overlapper/001/000090.ovb'
-  Bucketizing '1-overlapper/001/000091.ovb'
-  Bucketizing '1-overlapper/001/000092.ovb'
-  Bucketizing '1-overlapper/001/000093.ovb'
-  Bucketizing '1-overlapper/001/000094.ovb'
-  Bucketizing '1-overlapper/001/000095.ovb'
-  Bucketizing '1-overlapper/001/000096.ovb'
-  Bucketizing '1-overlapper/001/000097.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0010'
-  Bucketizing '1-overlapper/001/000098.ovb'
-  Bucketizing '1-overlapper/001/000099.ovb'
-  Bucketizing '1-overlapper/001/000100.ovb'
-  Bucketizing '1-overlapper/001/000101.ovb'
-  Bucketizing '1-overlapper/001/000102.ovb'
-  Bucketizing '1-overlapper/001/000103.ovb'
-  Bucketizing '1-overlapper/001/000104.ovb'
-  Bucketizing '1-overlapper/001/000105.ovb'
-  Bucketizing '1-overlapper/001/000106.ovb'
-  Bucketizing '1-overlapper/001/000107.ovb'
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Failed with 'Aborted'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::97 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
../../gcc-4.8.2/libstdc++-v3/libsupc++/vterminate.cc::95 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::38 in _ZN10__cxxabiv111__terminateEPFvvE()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_terminate.cc::48 in _ZSt9terminatev()
../../gcc-4.8.2/libstdc++-v3/libsupc++/eh_throw.cc::84 in __cxa_throw()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_op.cc::56 in _Znwm()
../../gcc-4.8.2/libstdc++-v3/libsupc++/new_opv.cc::32 in _Znam()
stores/ovStoreFile.C::286 in _ZN6ovFile10readBufferEv()
stores/ovStoreFile.C::317 in _ZN6ovFile11readOverlapEP9ovOverlap()
stores/ovStoreBuild.C::519 in main()
(null)::0 in (null)()
(null)::0 in (null)()
skoren commented 6 years ago

This looks again like an out of memory issue, it could also be due to disk corruption of the data. How did you submit the canu command to your grid (memory/threads?)

psur9757 commented 6 years ago

I submit the following script with the command qsub canu.pbs

#!/bin/bash
#PBS -P RDS-FAE-OSR-RW
#PBS -N canu3
#PBS -l select=1:ncpus=8:mem=64GB
#PBS -l walltime=36:00:00
#PBS -e ./OSR_512_canu3_error.txt
#PBS -o ./OSR_512_canu3_output.txt
#PBS -M priyanka.surana@sydney.edu.au
#PBS -m b

module load canu/1.7

canu -p OSR_512_canu3 -d /scratch/RDS-FAE-OSR-RW/OSR_512_canu3 gnuplot="/usr/local/gnuplot/5.0.0/bin/gnuplot" genomeSize=86m corOutCoverage=200 correctedErro
rRate=0.15 gridOptions="-P RDS-FAE-OSR-RW" gridOptionsJobName=OSR512 corConcurrency=4 gridOptionsCOR="-l walltime=36:00:00" gridOptionsCORMHAP="-l walltime=3
6:00:00" gridOptionsCOROVL="-l walltime=36:00:00" gridOptionsOBTOVL="-l walltime=156:00:00 -l nodes=1:ppn=8" gridOptionsUTGOVL="-l walltime=156:00:00 -l node
s=1:ppn=8" -pacbio-raw /scratch/RDS-FAE-OSR-RW/raw_data/m54078_170626_*.subreads.fasta

module unload canu/1.7
skoren commented 6 years ago

Since you already requested 64gb of ram, I don't think memory is an issue but instead disk corruption. Since you ran out of disk space during this run, the FS may not have properly detected/reported this error so you ended up with corrupt files.

Can you post the contents of ls on the unitigging/1-overlapper/001 folder? My first guess for the failed files would be 107 (the last file in the log), you could try removing all files named 107 from the 001 folder, re-run overlap.sh 107 and re-launch Canu. It may fail again if there are more corrupt files so you'll need to see what file it failed on in the log and re-run that job.

An alternate would be to add ovlMerThreshold=500 to your command and re-run the full unitigging step (remove the unitigging folder from your assembly directory and re-launch Canu). This will both use less space and speed up the compute.

psur9757 commented 6 years ago

ls unitigging/1-overlapper/001/

000001.counts  000013.counts  000025.counts  000037.counts  000049.counts  000061.counts  000073.counts  000085.counts  000097.counts  000109.counts
000001.ovb     000013.ovb     000025.ovb     000037.ovb     000049.ovb     000061.ovb     000073.ovb     000085.ovb     000097.ovb     000109.ovb
000001.stats   000013.stats   000025.stats   000037.stats   000049.stats   000061.stats   000073.stats   000085.stats   000097.stats   000109.stats
000002.counts  000014.counts  000026.counts  000038.counts  000050.counts  000062.counts  000074.counts  000086.counts  000098.counts  000110.counts
000002.ovb     000014.ovb     000026.ovb     000038.ovb     000050.ovb     000062.ovb     000074.ovb     000086.ovb     000098.ovb     000110.ovb
000002.stats   000014.stats   000026.stats   000038.stats   000050.stats   000062.stats   000074.stats   000086.stats   000098.stats   000110.stats
000003.counts  000015.counts  000027.counts  000039.counts  000051.counts  000063.counts  000075.counts  000087.counts  000099.counts  000111.counts
000003.ovb     000015.ovb     000027.ovb     000039.ovb     000051.ovb     000063.ovb     000075.ovb     000087.ovb     000099.ovb     000111.ovb
000003.stats   000015.stats   000027.stats   000039.stats   000051.stats   000063.stats   000075.stats   000087.stats   000099.stats   000111.stats
000004.counts  000016.counts  000028.counts  000040.counts  000052.counts  000064.counts  000076.counts  000088.counts  000100.counts  000112.counts
000004.ovb     000016.ovb     000028.ovb     000040.ovb     000052.ovb     000064.ovb     000076.ovb     000088.ovb     000100.ovb     000112.ovb
000004.stats   000016.stats   000028.stats   000040.stats   000052.stats   000064.stats   000076.stats   000088.stats   000100.stats   000112.stats
000005.counts  000017.counts  000029.counts  000041.counts  000053.counts  000065.counts  000077.counts  000089.counts  000101.counts  000113.counts
000005.ovb     000017.ovb     000029.ovb     000041.ovb     000053.ovb     000065.ovb     000077.ovb     000089.ovb     000101.ovb     000113.ovb
000005.stats   000017.stats   000029.stats   000041.stats   000053.stats   000065.stats   000077.stats   000089.stats   000101.stats   000113.stats
000006.counts  000018.counts  000030.counts  000042.counts  000054.counts  000066.counts  000078.counts  000090.counts  000102.counts  000114.counts
000006.ovb     000018.ovb     000030.ovb     000042.ovb     000054.ovb     000066.ovb     000078.ovb     000090.ovb     000102.ovb     000114.ovb
000006.stats   000018.stats   000030.stats   000042.stats   000054.stats   000066.stats   000078.stats   000090.stats   000102.stats   000114.stats
000007.counts  000019.counts  000031.counts  000043.counts  000055.counts  000067.counts  000079.counts  000091.counts  000103.counts  000115.counts
000007.ovb     000019.ovb     000031.ovb     000043.ovb     000055.ovb     000067.ovb     000079.ovb     000091.ovb     000103.ovb     000115.ovb
000007.stats   000019.stats   000031.stats   000043.stats   000055.stats   000067.stats   000079.stats   000091.stats   000103.stats   000115.stats
000008.counts  000020.counts  000032.counts  000044.counts  000056.counts  000068.counts  000080.counts  000092.counts  000104.counts  000116.counts
000008.ovb     000020.ovb     000032.ovb     000044.ovb     000056.ovb     000068.ovb     000080.ovb     000092.ovb     000104.ovb     000116.ovb
000008.stats   000020.stats   000032.stats   000044.stats   000056.stats   000068.stats   000080.stats   000092.stats   000104.stats   000116.stats
000009.counts  000021.counts  000033.counts  000045.counts  000057.counts  000069.counts  000081.counts  000093.counts  000105.counts  000117.counts
000009.ovb     000021.ovb     000033.ovb     000045.ovb     000057.ovb     000069.ovb     000081.ovb     000093.ovb     000105.ovb     000117.ovb
000009.stats   000021.stats   000033.stats   000045.stats   000057.stats   000069.stats   000081.stats   000093.stats   000105.stats   000117.stats
000010.counts  000022.counts  000034.counts  000046.counts  000058.counts  000070.counts  000082.counts  000094.counts  000106.counts
000010.ovb     000022.ovb     000034.ovb     000046.ovb     000058.ovb     000070.ovb     000082.ovb     000094.ovb     000106.ovb
000010.stats   000022.stats   000034.stats   000046.stats   000058.stats   000070.stats   000082.stats   000094.stats   000106.stats
000011.counts  000023.counts  000035.counts  000047.counts  000059.counts  000071.counts  000083.counts  000095.counts  000107.counts
000011.ovb     000023.ovb     000035.ovb     000047.ovb     000059.ovb     000071.ovb     000083.ovb     000095.ovb     000107.ovb
000011.stats   000023.stats   000035.stats   000047.stats   000059.stats   000071.stats   000083.stats   000095.stats   000107.stats
000012.counts  000024.counts  000036.counts  000048.counts  000060.counts  000072.counts  000084.counts  000096.counts  000108.counts
000012.ovb     000024.ovb     000036.ovb     000048.ovb     000060.ovb     000072.ovb     000084.ovb     000096.ovb     000108.ovb
000012.stats   000024.stats   000036.stats   000048.stats   000060.stats   000072.stats   000084.stats   000096.stats   000108.stats

I am currently re-running overlap.sh 107, if that doesn't work then I will delete the unitigging folder and restart Canu with updated parameters as you suggested. Thank you.

psur9757 commented 6 years ago

After running overlap.sh 107, I restarted the canu command and got a different error this time. Below are the relevant error files. Should I still delete the unitigging folder, set ovlMerThreshold=500 and relaunch Canu, as you suggested earlier?

canu_qsub_error.txt

    You will need to use the gridOptions param to specify the PBS project if using the PBS gridoption.
-- Canu 1.7
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_151' (from 'java').
-- Detected gnuplot version '5.0 patchlevel 0' (from '/usr/local/gnuplot/5.0.0/bin/gnuplot') and image format 'svg'.
-- Detected 48 CPUs and 188 gigabytes of memory.
-- Detected PBSPro 'PBSPro_13.1.0.160576' with 'pbsnodes' binary in /usr/local/pbs/default/bin/pbsnodes.
-- Detecting PBSPro resources.
-- 
-- Found   2 hosts with  24 cores and  250 GB memory under PBSPro control.
-- Found   3 hosts with  64 cores and 6057 GB memory under PBSPro control.
-- Found  49 hosts with  48 cores and  185 GB memory under PBSPro control.
-- Found  27 hosts with  36 cores and  187 GB memory under PBSPro control.
-- Found   1 host  with   1 core  and    5 GB memory under PBSPro control.
-- Found  80 hosts with  32 cores and  123 GB memory under PBSPro control.
-- Found   2 hosts with  24 cores and  502 GB memory under PBSPro control.
-- Found  59 hosts with  24 cores and  123 GB memory under PBSPro control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl      8 GB    4 CPUs  (k-mer counting)
-- Grid:  cormhap   13 GB    4 CPUs  (overlap detection with mhap)
-- Grid:  obtovl     8 GB    4 CPUs  (overlap detection)
-- Grid:  utgovl     8 GB    4 CPUs  (overlap detection)
-- Grid:  ovb        3 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8 GB    1 CPU   (overlap store sorting)
-- Grid:  red        8 GB    4 CPUs  (read error detection)
-- Grid:  oea        4 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       64 GB    8 CPUs  (contig construction)
-- Grid:  gfa        8 GB    8 CPUs  (GFA alignment and processing)
--
-- In 'OSR_512_canu3.gkpStore', found PacBio reads:
--   Raw:        1088846
--   Corrected:  802726
--   Trimmed:    779949
--
-- Generating assembly 'OSR_512_canu3' in '/scratch/RDS-FAE-OSR-RW/OSR_512_canu3'
--
-- Parameters:
--
--  genomeSize        86000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.1500 ( 15.00%)
--    utgOvlErrorRate 0.1500 ( 15.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.1500 ( 15.00%)
--    utgErrorRate    0.1500 ( 15.00%)
--    cnsErrorRate    0.1500 ( 15.00%)
--
--
-- BEGIN ASSEMBLY
--
----------------------------------------
-- Starting command on Mon Oct  8 06:58:24 2018 with 8776.642 GB free disk space

    cd unitigging
    /usr/local/canu/1.7/bin/ovStoreBuild \
     -O ./OSR_512_canu3.ovlStore.BUILDING \
     -G ./OSR_512_canu3.gkpStore \
     -M 2-8 \
     -L ./1-overlapper/ovljob.files \
     > ./OSR_512_canu3.ovlStore.err 2>&1

-- Finished on Mon Oct  8 06:58:31 2018 (7 seconds) with 8776.574 GB free disk space
----------------------------------------

ERROR:
ERROR:  Failed with exit code 1.  (rc=256)
ERROR:

ABORT:
ABORT: Canu 1.7
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
ABORT:   failed to create the overlap store.
ABORT:
ABORT: Disk space available:  8776.574 GB
ABORT:
ABORT: Last 50 lines of the relevant log file (unitigging/OSR_512_canu3.ovlStore.err):
ABORT:
ABORT:

unitigging/OSR_512_canu3.ovlStore.err

Found 534019854 (534.02 million) overlaps.
Configuring for 2.00 GB to 8.00 GB memory and 16368 open files.
Will sort using 10 files; 58720256 (58.72 million) overlaps per bucket; 2.00 GB memory per bucket
  bucket    1 has 53402015 olaps.
  bucket    2 has 53402097 olaps.
  bucket    3 has 53402129 olaps.
  bucket    4 has 53401999 olaps.
  bucket    5 has 53403593 olaps.
  bucket    6 has 53402063 olaps.
  bucket    7 has 53413263 olaps.
  bucket    8 has 53404677 olaps.
  bucket    9 has 53402063 olaps.
  bucket   10 has 53385955 olaps.
Will sort 53.402 million overlaps per bucket, using 10 buckets 1.84 GB per bucket.

-- BUCKETIZING --

ERROR:  './OSR_512_canu3.ovlStore.BUILDING' is a valid ovStore; cannot create a new one.
skoren commented 6 years ago

Remove the OSR_512_canu3.ovlStore.BUILDING folder before restarting canu to let it regenerate from scratch.

psur9757 commented 6 years ago

I fixed all the job files that had error, deleted the ovlStore.BUILDING folder and restarted Canu. I got a new error. It says not enough memory, but I am giving it 64GB of memory. Do I have to rerun job 98? I already ran through all the jobs till 115 and fixed errors.

canu_qsub_error.txt

-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_151' (from 'java').
-- Detected gnuplot version '5.0 patchlevel 0' (from '/usr/local/gnuplot/5.0.0/bin/gnuplot') and image format 'svg'.
-- Detected 48 CPUs and 188 gigabytes of memory.
-- Detected PBSPro 'PBSPro_13.1.0.160576' with 'pbsnodes' binary in /usr/local/pbs/default/bin/pbsnodes.
-- Detecting PBSPro resources.
-- 
-- Found  49 hosts with  48 cores and  185 GB memory under PBSPro control.
-- Found  27 hosts with  36 cores and  187 GB memory under PBSPro control.
-- Found  80 hosts with  32 cores and  123 GB memory under PBSPro control.
-- Found   2 hosts with  24 cores and  502 GB memory under PBSPro control.
-- Found  59 hosts with  24 cores and  123 GB memory under PBSPro control.
-- Found   2 hosts with  24 cores and  250 GB memory under PBSPro control.
-- Found   3 hosts with  64 cores and 6057 GB memory under PBSPro control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl      8 GB    4 CPUs  (k-mer counting)
-- Grid:  cormhap   13 GB    4 CPUs  (overlap detection with mhap)
-- Grid:  obtovl     8 GB    4 CPUs  (overlap detection)
-- Grid:  utgovl     8 GB    4 CPUs  (overlap detection)
-- Grid:  ovb        3 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8 GB    1 CPU   (overlap store sorting)
-- Grid:  red        8 GB    4 CPUs  (read error detection)
-- Grid:  oea        4 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       64 GB    8 CPUs  (contig construction)
-- Grid:  gfa        8 GB    8 CPUs  (GFA alignment and processing)
--
-- In 'OSR_512_canu3.gkpStore', found PacBio reads:
--   Raw:        1088846
--   Corrected:  802726
--   Trimmed:    779949
--
-- Generating assembly 'OSR_512_canu3' in '/scratch/RDS-FAE-OSR-RW/OSR_512_canu3'
--
-- Parameters:
--
--  genomeSize        86000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.1500 ( 15.00%)
--    utgOvlErrorRate 0.1500 ( 15.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.1500 ( 15.00%)
--    utgErrorRate    0.1500 ( 15.00%)
--    cnsErrorRate    0.1500 ( 15.00%)
--
--
-- BEGIN ASSEMBLY
--
----------------------------------------
-- Starting command on Fri Oct 26 00:45:41 2018 with 282.761 GB free disk space

    cd unitigging
    /usr/local/canu/1.7/bin/ovStoreBuild \
     -O ./OSR_512_canu3.ovlStore.BUILDING \
     -G ./OSR_512_canu3.gkpStore \
     -M 2-8 \
     -L ./1-overlapper/ovljob.files \
     > ./OSR_512_canu3.ovlStore.err 2>&1
sh: line 5: 294777 Aborted                 (core dumped) /usr/local/canu/1.7/bin/ovStoreBuild -O ./OSR_512_canu3.ovlStore.BUILDING -G ./OSR_512_canu3.gkpStor
e -M 2-8 -L ./1-overlapper/ovljob.files > ./OSR_512_canu3.ovlStore.err 2>&1

-- Finished on Fri Oct 26 00:47:56 2018 (135 seconds) with 263.502 GB free disk space
----------------------------------------

ERROR:
ERROR:  Failed with exit code 134.  (rc=34304)
ERROR:

ABORT:
ABORT: Canu 1.7
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
ABORT:   failed to create the overlap store.
ABORT:
ABORT: Disk space available:  263.502 GB
ABORT:
ABORT: Last 50 lines of the relevant log file (unitigging/OSR_512_canu3.ovlStore.err):
ABORT:
ABORT:

Last 50 lines of unitigging/OSR_512_canu3.ovlStore.err

-  Bucketizing '1-overlapper/001/000070.ovb'
-  Bucketizing '1-overlapper/001/000071.ovb'
-  Bucketizing '1-overlapper/001/000072.ovb'
-  Bucketizing '1-overlapper/001/000073.ovb'
-  Bucketizing '1-overlapper/001/000074.ovb'
-  Bucketizing '1-overlapper/001/000075.ovb'
-  Bucketizing '1-overlapper/001/000076.ovb'
-  Bucketizing '1-overlapper/001/000077.ovb'
-  Bucketizing '1-overlapper/001/000078.ovb'
-  Bucketizing '1-overlapper/001/000079.ovb'
-  Bucketizing '1-overlapper/001/000080.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0009'
-  Bucketizing '1-overlapper/001/000081.ovb'
-  Bucketizing '1-overlapper/001/000082.ovb'
-  Bucketizing '1-overlapper/001/000083.ovb'
-  Bucketizing '1-overlapper/001/000084.ovb'
-  Bucketizing '1-overlapper/001/000085.ovb'
-  Bucketizing '1-overlapper/001/000086.ovb'
-  Bucketizing '1-overlapper/001/000087.ovb'
-  Bucketizing '1-overlapper/001/000088.ovb'
-  Bucketizing '1-overlapper/001/000089.ovb'
-  Bucketizing '1-overlapper/001/000090.ovb'
-  Bucketizing '1-overlapper/001/000091.ovb'
-  Bucketizing '1-overlapper/001/000092.ovb'
-  Bucketizing '1-overlapper/001/000093.ovb'
-  Bucketizing '1-overlapper/001/000094.ovb'
-  Bucketizing '1-overlapper/001/000095.ovb'
-  Bucketizing '1-overlapper/001/000096.ovb'
-  Bucketizing '1-overlapper/001/000097.ovb'
-- Create bucket './OSR_512_canu3.ovlStore.BUILDING/tmp.sort.0010'
-  Bucketizing '1-overlapper/001/000098.ovb'
safeWrite()-- Write failure on ovFile::writeBuffer::sb: No space left on device
safeWrite()-- Wanted to write 849100 objects (size=1), wrote 369504.
ovStoreBuild: AS_UTL/AS_UTL_fileIO.C:107: void AS_UTL_safeWrite(FILE*, const void*, const char*, size_t, size_t): Assertion `(*__errno_location ()) == 0' failed.

Failed with 'Aborted'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::97 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
AS_UTL/AS_UTL_fileIO.C::107 in _Z16AS_UTL_safeWriteP8_IO_FILEPKvPKcmm()
stores/ovStoreFile.C::182 in _ZN6ovFile11writeBufferEb()
stores/ovStoreFile.C::162 in _ZN6ovFile11writeBufferEb()
stores/ovStoreFile.C::202 in _ZN6ovFile12writeOverlapEP9ovOverlap()
stores/ovStoreBuild.C::338 in writeToDumpFile()
stores/ovStoreBuild.C::532 in main()
(null)::0 in (null)()
(null)::0 in (null)()

canu.pbs

#!/bin/bash
#PBS -P RDS-FAE-OSR-RW
#PBS -N canu3
#PBS -l select=1:ncpus=8:mem=64GB
#PBS -l walltime=36:00:00
#PBS -e ./OSR_512_canu3_error.txt
#PBS -o ./OSR_512_canu3_output.txt
#PBS -M priyanka.surana@sydney.edu.au
#PBS -m b

module load canu/1.7

canu -p OSR_512_canu3 -d /scratch/RDS-FAE-OSR-RW/OSR_512_canu3 gnuplot="/usr/local/gnuplot/5.0.0/bin/gnuplot" genomeSize=86m corOutCoverage=200 correctedErrorRate=0.15 gridOptions="-P RDS-FAE-OSR-RW" gridOptionsJobName=OSR512 corConcurrency=4 gridOptionsCOR="-l walltime=36:00:00" gridOptionsCORMHAP="-l walltime=36:00:00" gridOptionsCOROVL="-l walltime=36:00:00" gridOptionsOBTOVL="-l walltime=156:00:00 -l nodes=1:ppn=8" gridOptionsUTGOVL="-l walltime=156:00:00 -l nodes=1:ppn=8" -pacbio-raw /scratch/RDS-FAE-OSR-RW/raw_data/m54078_170626_*.subreads.fasta

module unload canu/1.7
brianwalenz commented 6 years ago
safeWrite()-- Write failure on ovFile::writeBuffer::sb: No space left on device

It would appear you've hit some kind of quota limit on disk usage, since the device also seems to have ~250 GB free space.

psur9757 commented 6 years ago

My Canu run completed. I gave the options for a smashed ploidy genome but I got separate contig and unitig files. I was expecting one smashed genome. How are these different from the ones with the default parameters?

I followed this approach:

Smash haplotypes together and then do phasing using another approach (like HapCUT2 or whatshap or others). In that case you want to do the opposite, increase the error rates used for finding overlaps: corOutCoverage=200 correctedErrorRate=0.15

My canu script - canu.pbs

#!/bin/bash
#PBS -P RDS-FAE-OSR-RW
#PBS -N canu3
#PBS -l select=1:ncpus=8:mem=64GB
#PBS -l walltime=36:00:00
#PBS -e ./OSR_512_canu3_error.txt
#PBS -o ./OSR_512_canu3_output.txt
#PBS -M priyanka.surana@sydney.edu.au
#PBS -m abe

module load canu/1.7

canu -p OSR_512_canu3 -d /scratch/RDS-FAE-OSR-RW/OSR_512_canu3 gnuplot="/usr/local/gnuplot/5.0.0/bin/gnuplot" genomeSize=86m corOutCoverage=200 correctedErrorRate=0.15 gridOptions="-P RDS-FAE-OSR-RW" gridOptionsJobName=OSR512 corConcurrency=4 gridOptionsCOR="-l walltime=36:00:00" gridOptionsCORMHAP="-l walltime=36:00:00" gridOptionsCOROVL="-l walltime=36:00:00" gridOptionsOBTOVL="-l walltime=156:00:00 -l nodes=1:ppn=8" gridOptionsUTGOVL="-l walltime=156:00:00 -l nodes=1:ppn=8" gridOptionsOEA="-l walltime=16:00:00 -l nodes=1:ppn=8" oeaMemory=64 -pacbio-raw /scratch/RDS-FAE-OSR-RW/raw_data/m54078_170626_*.subreads.fasta

module unload canu/1.7
skoren commented 5 years ago

The contigs and unitigs represent the same genome. See the FAQ for more details on the outputs: https://canu.readthedocs.io/en/latest/tutorial.html#outputs. For most applications you want to use the contigs. Closing since your run completed and the original issue is resolved.