marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
658 stars 179 forks source link

v1.7 - Segmentation fault, bogart failed #844

Closed ovidp closed 6 years ago

ovidp commented 6 years ago

Hi I am running a canu (1.7) assembly of a plant genome (1.3gb) based on 11x PacBio reads. It all goes well till unitigging, when bogart fails with a segmentation fault. I tried restarting and then I start the entire process fresh, but I am getting the same error. I would greatly appreaciate some help.

I am running canu in a Linux cluster. The canu command used: '/canu/Linux-amd64/bin/canu -assemble -d /PacBio/ useGrid=false correctedErrorRate=0.105 -pacbio-corrected /PacBio/all_corr_pacbio.fasta genomeSize=1.3g -p HEcanuCorrLo'

The contents of unitigging/4-unitigger/unitigger.err:

==> PARAMETERS.

Resources:
  Memory                189 GB
  Compute Threads       16 (command line)

Lengths:
  Minimum read          0 bases
  Minimum overlap       500 bases

Overlap Error Rates:
  Graph                 0.105 (10.500%)
  Max                   0.105 (10.500%)

Deviations:
  Graph                 6.000
  Bubble                6.000
  Repeat                3.000

Edge Confusion:
  Absolute              2100
  Percent               200.0000

Unitig Construction:
  Minimum intersection  500 bases
  Maxiumum placements   2 positions

Debugging Enabled:
  (none)

==> LOADING AND FILTERING OVERLAPS.

ReadInfo()-- Using 2138563 reads, no minimum read length used.

OverlapCache()-- limited to 193536MB memory (user supplied).

OverlapCache()--      16MB for read data.
OverlapCache()--      81MB for best edges.
OverlapCache()--     212MB for tigs.
OverlapCache()--      57MB for tigs - read layouts.
OverlapCache()--      81MB for tigs - error profiles.
OverlapCache()--   48384MB for tigs - error profile overlaps.
OverlapCache()--   19355MB for other processes.
OverlapCache()-- ---------
OverlapCache()--   68229MB for data structures (sum of above).
OverlapCache()-- ---------
OverlapCache()--      40MB for overlap store structure.
OverlapCache()--  125266MB for overlap data.
OverlapCache()-- ---------
OverlapCache()--  193536MB allowed.
OverlapCache()--
OverlapCache()-- Retain at least 22 overlaps/read, based on 11.28x coverage.
OverlapCache()-- Initial guess at 3838 overlaps/read.
OverlapCache()--
OverlapCache()-- Adjusting for sparse overlaps.
OverlapCache()--
OverlapCache()--               reads loading olaps          olaps               memory
OverlapCache()--   olaps/read       all      some          loaded                 free
OverlapCache()--   ----------   -------   -------     ----------- -------     --------
OverlapCache()--         3838   2126932     11631       134145428  85.20%     123219 MB
OverlapCache()--       698128   2138563         0       157439780 100.00%     122863 MB
OverlapCache()--
OverlapCache()-- Loading overlaps.
OverlapCache()--
OverlapCache()--          read from store           saved in cache
OverlapCache()--   ------------ ---------   ------------ ---------
OverlapCache()--       29347933 (018.64%)       28982507 (018.41%)
OverlapCache()--       58686018 (037.28%)       57964060 (036.82%)
OverlapCache()--       86967419 (055.24%)       85908451 (054.57%)
OverlapCache()--      114979131 (073.03%)      113584331 (072.14%)
OverlapCache()--      143507013 (091.15%)      141800857 (090.07%)
OverlapCache()--   ------------ ---------   ------------ ---------
OverlapCache()--      157439780 (100.00%)      155583954 (098.82%)
OverlapCache()--
OverlapCache()-- Ignored 1191848 duplicate overlaps.
OverlapCache()--
OverlapCache()-- Symmetrizing overlaps.
OverlapCache()--   Finding missing twins.
OverlapCache()--   Found 119302 missing twins in 155583954 overlaps, 1620 are strong.
OverlapCache()--   Dropping weak non-twin overlaps; allocated 0 MB scratch space.
OverlapCache()--   Dropped 3422 overlaps; scratch space released.
OverlapCache()--   Adding 115880 missing twin overlaps.
OverlapCache()--   Finished.

BestOverlapGraph()-- allocating best edges (65MB)

BestOverlapGraph()-- finding initial best edges.

BestOverlapGraph()-- filtering suspicious reads.
BestOverlapGraph()-- marked 1626591 reads as suspicious.

BestOverlapGraph()-- filtering high error edges.

BestOverlapGraph()-- filtering reads with lopsided best edges.

BestOverlapGraph()-- filtering spur reads.
BestOverlapGraph()-- detected 276777 spur reads and 1704959 singleton reads.

Failed with 'Segmentation fault'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::97 in _Z17AS_UTL_catchCrashiP9siginfo_tPv()
(null)::0 in (null)()
../../../../../libstdc++-v3/src/c++98/tree.cc::138 in local_Rb_tree_rotate_left()
../../../../../libstdc++-v3/src/c++98/tree.cc::278 in _ZSt29_Rb_tree_insert_and_rebalancebPSt18_Rb_tree_node_baseS0_RS_()
/usr/include/c++/4.8/bits/stl_tree.h::1025 in _ZNSt8_Rb_treeIjjSt9_IdentityIjESt4lessIjESaIjEE10_M_insert_EPSt18_Rb_tree_node_baseS7_RKj()
/usr/include/c++/4.8/bits/stl_tree.h::1382 in _ZNSt8_Rb_treeIjjSt9_IdentityIjESt4lessIjESaIjEE16_M_insert_uniqueERKj()
/usr/include/c++/4.8/bits/stl_set.h::463 in _ZNSt3setIjSt4lessIjESaIjEE6insertERKj()
bogart/AS_BAT_BestOverlapGraph.C::417 in _ZN16BestOverlapGraph11findZombiesEPKc._omp_fn.3()
../../../libgomp/team.c::116 in gomp_thread_start()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()

The contents of the 'log file':

-- Canu snapshot v1.7 +23 changes (r8715 967fcea3c70699eaccc92ff5bfe36d9d10e65a55)
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
-- 
-- Read and contig alignments during correction, consensus and GFA building use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
--   Li H.
--   Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.
--   Bioinformatics. 2016 Jul 15;32(14):2103-10.
--   http://doi.org/10.1093/bioinformatics/btw152
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_92' (from '/net/gmi.oeaw.ac.at/software/mendel/intel-x86_64-sandybridge-avx/software/Java/1.8.0_92/bin/java').
-- Detected gnuplot version '4.6 patchlevel 0' (from 'gnuplot') and image format 'svg'.
-- Detected 16 CPUs and 189 gigabytes of memory.
-- No grid engine detected, grid disabled.
--
--                            (tag)Concurrency
--                     (tag)Threads          |
--            (tag)Memory         |          |
--        (tag)         |         |          |     total usage     algorithm
--        -------  ------  --------   --------  -----------------  -----------------------------
-- Local: meryl    189 GB   16 CPUs x   1 job    189 GB   16 CPUs  (k-mer counting)
-- Local: cormhap   32 GB   16 CPUs x   1 job     32 GB   16 CPUs  (overlap detection with mhap)
-- Local: obtovl    16 GB   16 CPUs x   1 job     16 GB   16 CPUs  (overlap detection)
-- Local: utgovl    16 GB   16 CPUs x   1 job     16 GB   16 CPUs  (overlap detection)
-- Local: ovb        4 GB    1 CPU  x  16 jobs    64 GB   16 CPUs  (overlap store bucketizer)
-- Local: ovs       32 GB    1 CPU  x   5 jobs   160 GB    5 CPUs  (overlap store sorting)
-- Local: red        8 GB    4 CPUs x   4 jobs    32 GB   16 CPUs  (read error detection)
-- Local: oea        4 GB    1 CPU  x  16 jobs    64 GB   16 CPUs  (overlap error adjustment)
-- Local: bat      189 GB   16 CPUs x   1 job    189 GB   16 CPUs  (contig construction)
-- Local: gfa       16 GB   16 CPUs x   1 job     16 GB   16 CPUs  (GFA alignment and processing)
--
-- In 'HEcanuCorrLo.gkpStore', found PacBio reads:
--   Raw:        0
--   Corrected:  2138563
--   Trimmed:    2138563
--
-- Generating assembly 'HEcanuCorrLo' in '/lustre/scratch/users/ovidiu.paun/PacBio'
--
-- Parameters:
--
--  genomeSize        1300000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.1050 ( 10.50%)
--    utgOvlErrorRate 0.1050 ( 10.50%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.1050 ( 10.50%)
--    utgErrorRate    0.1050 ( 10.50%)
--    cnsErrorRate    0.1050 ( 10.50%)
--
--
-- BEGIN ASSEMBLY
--
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'utgovl' concurrent execution on Sun Mar 25 23:09:59 2018 with 15192.636 GB free disk space (16 processes; 1 concurrently)

    cd unitigging/1-overlapper
    ./overlap.sh 28 > ./overlap.000028.out 2>&1
    ./overlap.sh 29 > ./overlap.000029.out 2>&1
    ./overlap.sh 30 > ./overlap.000030.out 2>&1
    ./overlap.sh 31 > ./overlap.000031.out 2>&1
    ./overlap.sh 32 > ./overlap.000032.out 2>&1
    ./overlap.sh 33 > ./overlap.000033.out 2>&1
    ./overlap.sh 34 > ./overlap.000034.out 2>&1
    ./overlap.sh 35 > ./overlap.000035.out 2>&1
    ./overlap.sh 36 > ./overlap.000036.out 2>&1
    ./overlap.sh 37 > ./overlap.000037.out 2>&1
    ./overlap.sh 38 > ./overlap.000038.out 2>&1
    ./overlap.sh 39 > ./overlap.000039.out 2>&1
    ./overlap.sh 40 > ./overlap.000040.out 2>&1
    ./overlap.sh 41 > ./overlap.000041.out 2>&1
    ./overlap.sh 42 > ./overlap.000042.out 2>&1
    ./overlap.sh 43 > ./overlap.000043.out 2>&1

-- Finished on Wed Mar 28 18:12:47 2018 (241368 seconds) with 14171.596 GB free disk space
----------------------------------------
-- Found 43 overlapInCore output files.
--
-- overlapInCore compute 'unitigging/1-overlapper':
--   kmer hits
--     with no overlap      45501570067  6048.06977 +- 620223540.657
--     with an overlap         78719890  5.11627907 +- 1063541.313
--
--   overlaps                  78719890  5.11627907 +- 1063541.313
--     contained               26906585  .534883721 +- 377754.298
--     dovetail                51813305  0.58139535 +- 686550.596
--
--   overlaps rejected
--     multiple per pair              0           0 +- 0
--     bad short window               0           0 +- 0
--     bad long window                0           0 +- 0
----------------------------------------
-- Starting command on Wed Mar 28 18:12:47 2018 with 14171.596 GB free disk space

    cd unitigging
   /canu/Linux-amd64/bin/ovStoreBuild \
     -O ./HEcanuCorrLo.ovlStore.BUILDING \
     -G ./HEcanuCorrLo.gkpStore \
     -M 4-32 \
     -L ./1-overlapper/ovljob.files \
     > ./HEcanuCorrLo.ovlStore.err 2>&1

-- Finished on Wed Mar 28 18:15:10 2018 (143 seconds) with 14152.947 GB free disk space
----------------------------------------
-- Checking store.
----------------------------------------
-- Starting command on Wed Mar 28 18:15:10 2018 with 14152.947 GB free disk space

    cd unitigging
   /canu/Linux-amd64/bin/ovStoreDump \
     -G ./HEcanuCorrLo.gkpStore \
     -O ./HEcanuCorrLo.ovlStore \
     -d -counts \
     > ./HEcanuCorrLo.ovlStore/counts.dat 2> ./HEcanuCorrLo.ovlStore/counts.err

-- Finished on Wed Mar 28 18:15:11 2018 (1 second) with 14152.947 GB free disk space
----------------------------------------
--
-- Overlap store 'unitigging/HEcanuCorrLo.ovlStore' successfully constructed.
-- Found 157439780 overlaps for 550391 reads; 1588172 reads have no overlaps.
--
--
-- Purged 1.572 GB in 129 overlap output files.
----------------------------------------
-- Starting command on Wed Mar 28 18:15:14 2018 with 14152.39 GB free disk space

    cd unitigging
   /canu/Linux-amd64/bin/ovStoreStats \
     -G ./HEcanuCorrLo.gkpStore \
     -O ./HEcanuCorrLo.ovlStore \
     -C 11 \
     -o ./HEcanuCorrLo.ovlStore \
     > ./HEcanuCorrLo.ovlStore.summary.err 2>&1

-- Finished on Wed Mar 28 18:16:05 2018 (51 seconds) with 14148.996 GB free disk space
----------------------------------------
--
-- Overlap store 'unitigging/HEcanuCorrLo.ovlStore' contains:
--
--   category            reads     %          read length        feature size or coverage  analysis
--   ----------------  -------  -------  ----------------------  ------------------------  --------------------
--   middle-missing      77125    3.61    11221.40 +- 6346.65       3338.89 +- 3694.25    (bad trimming)
--   middle-hump         61626    2.88    11920.87 +- 6178.76       9550.08 +- 6039.32    (bad trimming)
--   no-5-prime         180664    8.45     7729.42 +- 4847.88       5592.58 +- 4630.36    (bad trimming)
--   no-3-prime          99962    4.67     7417.30 +- 4463.44       5276.57 +- 4241.36    (bad trimming)
--   
--   low-coverage        16160    0.76     3561.37 +- 2421.01          1.75 +- 0.76       (easy to assemble, potential for lower quality consensus)
--   unique              16974    0.79     3741.52 +- 2491.78          8.00 +- 2.99       (easy to assemble, perfect, yay)
--   repeat-cont         61443    2.87     5770.34 +- 3660.59        929.18 +- 1259.15    (potential for consensus errors, no impact on assembly)
--   repeat-dove           632    0.03    15601.24 +- 7152.88        453.37 +- 581.10     (hard to assemble, likely won't assemble correctly or even at all)
--   
--   span-repeat          5299    0.25     6810.68 +- 3631.66       2594.70 +- 2564.25    (read spans a large repeat, usually easy to assemble)
--   uniq-repeat-cont    20664    0.97     5177.34 +- 3198.95                             (should be uniquely placed, low potential for consensus errors, no impact on assembly)
--   uniq-repeat-dove     4807    0.22     8171.09 +- 4468.72                             (will end contigs, potential to misassemble)
--   uniq-anchor          5035    0.24     9108.13 +- 4910.64       1420.18 +- 1816.33    (repeat read, with unique section, probable bad read)
--
-- Loading read lengths.
-- Loading number of overlaps per read.
--
-- Configure RED for 8gb memory.
--                   Batches of at most (unlimited) reads.
--                                      500000000 bases.
--                   Expecting evidence of at most 751708149 bases per iteration.
--
--           Total                                               Reads                 Olaps Evidence
--    Job   Memory      Read Range         Reads        Bases   Memory        Olaps   Memory   Memory  (Memory in MB)
--   ---- -------- ------------------- --------- ------------ -------- ------------ -------- --------
--      1  8192.02         1-53236         53236    406751735  4656.58      4690102    53.67  1433.77
--      2  8192.03     53237-108640        55404    407058176  4660.15      4378311    50.11  1433.77
--      3  8192.06    108641-165681        57041    407060655  4660.23      4373937    50.06  1433.77
--      4  8192.20    165682-223579        57898    406926469  4658.73      4517802    51.70  1433.77
--      5  8192.02    223580-281937        58358    406768637  4656.93      4658623    53.31  1433.77
--      6  8192.02    281938-340027        58090    407147650  4661.26      4280241    48.98  1433.77
--      7  8192.03    340028-398170        58143    406903129  4658.47      4525458    51.79  1433.77
--      8  8192.02    398171-456141        57971    406961128  4659.12      4467801    51.13  1433.77
--      9  8192.06    456142-514312        58171    407007916  4659.67      4423468    50.62  1433.77
--     10  8192.02    514313-572003        57691    407213170  4662.00      4215923    48.25  1433.77
--     11  8192.03    572004-628990        56987    407220443  4662.06      4211870    48.20  1433.77
--     12  8192.01    628991-684956        55966    407046114  4660.03      4386794    50.20  1433.77
--     13  8192.11    684957-743122        58166    406875055  4658.15      4560387    52.19  1433.77
--     14  8192.07    743123-801973        58851    407168672  4661.53      4261673    48.77  1433.77
--     15  8192.12    801974-860697        58724    407244086  4662.39      4191514    47.97  1433.77
--     16  8192.01    860698-919443        58746    406938801  4658.89      4486500    51.34  1433.77
--     17  8192.08    919444-977503        58060    407058542  4660.24      4374867    50.07  1433.77
--     18  8192.04    977504-1034412       56909    407141746  4661.16      4291247    49.11  1433.77
--     19  8192.23   1034413-1090958       56546    407051103  4660.11      4399832    50.35  1433.77
--     20  8192.02   1090959-1150102       59144    407180421  4661.67      4244928    48.58  1433.77
--     21  8192.05   1150103-1210676       60574    407327615  4663.40      4096032    46.88  1433.77
--     22  8192.10   1210677-1271576       60900    407364754  4663.84      4062336    46.49  1433.77
--     23  8192.29   1271577-1332469       60893    407401180  4664.25      4042686    46.26  1433.77
--     24  8192.01   1332470-1392863       60394    407183726  4661.75      4237685    48.50  1433.77
--     25  8192.06   1392864-1453334       60471    407245779  4662.46      4179038    47.83  1433.77
--     26  8192.08   1453335-1513091       59757    407093086  4660.69      4335507    49.62  1433.77
--     27  8192.01   1513092-1572596       59505    406820379  4657.56      4603248    52.68  1433.77
--     28  8192.17   1572597-1631900       59304    406782920  4657.13      4655267    53.28  1433.77
--     29  8192.00   1631901-1691702       59802    406834482  4657.73      4587501    52.50  1433.77
--     30  8192.05   1691703-1753446       61744    406920733  4658.78      4499911    51.50  1433.77
--     31  8192.08   1753447-1817396       63950    406924947  4658.90      4492130    51.41  1433.77
--     32  8192.09   1817397-1881863       64467    407192994  4661.98      4223568    48.33  1433.77
--     33  8192.10   1881864-1946497       64634    407068661  4660.57      4348615    49.77  1433.77
--     34  8192.10   1946498-2011054       64557    407341488  4663.69      4076124    46.65  1433.77
--     35  8192.11   2011055-2074416       63362    406965302  4659.34      4456528    51.00  1433.77
--     36  8192.02   2074417-2136642       62226    406949112  4659.12      4467353    51.12  1433.77
--     37  3633.09   2136643-2138563        1921     13082253   149.77       134973     1.54  1433.77
--   ---- -------- ------------------- --------- ------------ -------- ------------ -------- --------
--                                                14667223059             157439780
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'red' concurrent execution on Wed Mar 28 18:16:32 2018 with 14147.134 GB free disk space (37 processes; 4 concurrently)

    cd unitigging/3-overlapErrorAdjustment
    ./red.sh 1 > ./red.000001.out 2>&1
    ./red.sh 2 > ./red.000002.out 2>&1
    ./red.sh 3 > ./red.000003.out 2>&1
    ./red.sh 4 > ./red.000004.out 2>&1
    ./red.sh 5 > ./red.000005.out 2>&1
    ./red.sh 6 > ./red.000006.out 2>&1
    ./red.sh 7 > ./red.000007.out 2>&1
    ./red.sh 8 > ./red.000008.out 2>&1
    ./red.sh 9 > ./red.000009.out 2>&1
    ./red.sh 10 > ./red.000010.out 2>&1
    ./red.sh 11 > ./red.000011.out 2>&1
    ./red.sh 12 > ./red.000012.out 2>&1
    ./red.sh 13 > ./red.000013.out 2>&1
    ./red.sh 14 > ./red.000014.out 2>&1
    ./red.sh 15 > ./red.000015.out 2>&1
    ./red.sh 16 > ./red.000016.out 2>&1
    ./red.sh 17 > ./red.000017.out 2>&1
    ./red.sh 18 > ./red.000018.out 2>&1
    ./red.sh 19 > ./red.000019.out 2>&1
    ./red.sh 20 > ./red.000020.out 2>&1
    ./red.sh 21 > ./red.000021.out 2>&1
    ./red.sh 22 > ./red.000022.out 2>&1
    ./red.sh 23 > ./red.000023.out 2>&1
    ./red.sh 24 > ./red.000024.out 2>&1
    ./red.sh 25 > ./red.000025.out 2>&1
    ./red.sh 26 > ./red.000026.out 2>&1
    ./red.sh 27 > ./red.000027.out 2>&1
    ./red.sh 28 > ./red.000028.out 2>&1
    ./red.sh 29 > ./red.000029.out 2>&1
    ./red.sh 30 > ./red.000030.out 2>&1
    ./red.sh 31 > ./red.000031.out 2>&1
    ./red.sh 32 > ./red.000032.out 2>&1
    ./red.sh 33 > ./red.000033.out 2>&1
    ./red.sh 34 > ./red.000034.out 2>&1
    ./red.sh 35 > ./red.000035.out 2>&1
    ./red.sh 36 > ./red.000036.out 2>&1
    ./red.sh 37 > ./red.000037.out 2>&1

-- Finished on Wed Mar 28 19:43:48 2018 (5236 seconds) with 13822.363 GB free disk space
----------------------------------------
-- Found 37 read error detection output files.
--
-- Loading read lengths.
-- Loading number of overlaps per read.
--
-- Configure OEA for 4gb memory.
--                   Batches of at most (unlimited) reads.
--                                      300000000 bases.
--
--           Total                                               Reads                 Olaps  Adjusts
--    Job   Memory      Read Range         Reads        Bases   Memory        Olaps   Memory   Memory  (Memory in MB)
--   ---- -------- ------------------- --------- ------------ -------- ------------ -------- --------
--      1  3022.12         1-39387         39387    300001809   295.31      3405530   103.93   574.88
--      2  3019.34     39388-78986         39599    300001730   295.31      3314228   101.14   574.88
--      3  3013.92     78987-120511        41525    300005698   295.37      3134633    95.66   574.88
--      4  3021.27    120512-162713        42202    300006623   295.40      3374914   102.99   574.88
--      5  3015.90    162714-205447        42734    300005738   295.41      3198499    97.61   574.88
--      6  3025.96    205448-248391        42944    300001634   295.41      3527889   107.66   574.88
--      7  3018.78    248392-291146        42755    300000693   295.41      3292801   100.49   574.88
--      8  3015.79    291147-334159        43013    300006775   295.42      3194383    97.48   574.88
--      9  3017.60    334160-376856        42697    300006630   295.41      3254191    99.31   574.88
--     10  3022.95    376857-419505        42649    300003122   295.41      3429569   104.66   574.88
--     11  3017.35    419506-462579        43074    300003180   295.42      3245520    99.05   574.88
--     12  3018.38    462580-505406        42827    300022708   295.43      3278936   100.07   574.88
--     13  3013.41    505407-547799        42393    300008097   295.40      3117190    95.13   574.88
--     14  3008.12    547800-590464        42665    300007331   295.41      2943476    89.83   574.88
--     15  3018.79    590465-632166        41702    300007048   295.38      3294131   100.53   574.88
--     16  3017.18    632167-673187        41021    300000621   295.35      3242224    98.94   574.88
--     17  3017.94    673188-715932        42745    300004316   295.41      3265342    99.65   574.88
--     18  3023.90    715933-758883        42951    300022712   295.43      3459705   105.58   574.88
--     19  3010.15    758884-802258        43375    300006622   295.43      3009352    91.84   574.88
--     20  3013.10    802259-845562        43304    300004264   295.43      3106071    94.79   574.88
--     21  3018.56    845563-888981        43419    300011068   295.44      3284750   100.24   574.88
--     22  3015.40    888982-932138        43157    300001161   295.42      3181821    97.10   574.88
--     23  3017.36    932139-974885        42747    300002388   295.41      3246240    99.07   574.88
--     24  3014.79    974886-1016930       42045    300003372   295.39      3162788    96.52   574.88
--     25  3012.00   1016931-1058658       41728    300003312   295.38      3071756    93.74   574.88
--     26  3019.50   1058659-1100376       41718    300003579   295.38      3317371   101.24   574.88
--     27  3014.53   1100377-1144201       43825    300005645   295.45      3152457    96.21   574.88
--     28  3010.42   1144202-1188614       44413    300001584   295.46      3017350    92.08   574.88
--     29  3012.34   1188615-1233389       44775    300009590   295.48      3079485    93.98   574.88
--     30  3007.96   1233390-1278361       44972    300011643   295.49      2935869    89.60   574.88
--     31  3009.54   1278362-1323247       44886    300001380   295.47      2988071    91.19   574.88
--     32  3013.02   1323248-1367846       44599    300004461   295.47      3102057    94.67   574.88
--     33  3012.70   1367847-1412070       44224    300014065   295.47      3091839    94.36   574.88
--     34  3012.25   1412071-1456755       44685    300006820   295.47      3076846    93.90   574.88
--     35  3017.97   1456756-1500798       44043    300003167   295.45      3265019    99.64   574.88
--     36  3020.15   1500799-1544662       43864    300009600   295.45      3336422   101.82   574.88
--     37  3020.66   1544663-1588578       43916    300002472   295.44      3353143   102.33   574.88
--     38  3023.31   1588579-1632227       43649    300011383   295.45      3440228   104.99   574.88
--     39  3023.18   1632228-1676299       44072    300006196   295.45      3435557   104.84   574.88
--     40  3023.69   1676300-1721098       44799    300006682   295.48      3451504   105.33   574.88
--     41  3015.36   1721099-1767180       46082    300003784   295.51      3177516    96.97   574.88
--     42  3018.60   1767181-1814386       47206    300007039   295.55      3282206   100.16   574.88
--     43  3015.36   1814387-1861924       47538    300006251   295.56      3175976    96.92   574.88
--     44  3017.75   1861925-1909502       47578    300010111   295.56      3254158    99.31   574.88
--     45  3011.11   1909503-1957203       47701    300007354   295.57      3036325    92.66   574.88
--     46  3007.58   1957204-2004771       47568    300000013   295.55      2920950    89.14   574.88
--     47  3019.55   2004772-2051632       46861    300026111   295.56      3313293   101.11   574.88
--     48  3016.49   2051633-2097750       46118    300007429   295.52      3214236    98.09   574.88
--     49  2977.79   2097751-2138563       40813    266908048   263.79      2985963    91.12   574.88
--   ---- -------- ------------------- --------- ------------ -------- ------------ -------- --------
--                                                14667223059             157439780
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'oea' concurrent execution on Wed Mar 28 19:44:17 2018 with 13821.989 GB free disk space (49 processes; 16 concurrently)

    cd unitigging/3-overlapErrorAdjustment
    ./oea.sh 1 > ./oea.000001.out 2>&1
    ./oea.sh 2 > ./oea.000002.out 2>&1
    ./oea.sh 3 > ./oea.000003.out 2>&1
    ./oea.sh 4 > ./oea.000004.out 2>&1
    ./oea.sh 5 > ./oea.000005.out 2>&1
    ./oea.sh 6 > ./oea.000006.out 2>&1
    ./oea.sh 7 > ./oea.000007.out 2>&1
    ./oea.sh 8 > ./oea.000008.out 2>&1
    ./oea.sh 9 > ./oea.000009.out 2>&1
    ./oea.sh 10 > ./oea.000010.out 2>&1
    ./oea.sh 11 > ./oea.000011.out 2>&1
    ./oea.sh 12 > ./oea.000012.out 2>&1
    ./oea.sh 13 > ./oea.000013.out 2>&1
    ./oea.sh 14 > ./oea.000014.out 2>&1
    ./oea.sh 15 > ./oea.000015.out 2>&1
    ./oea.sh 16 > ./oea.000016.out 2>&1
    ./oea.sh 17 > ./oea.000017.out 2>&1
    ./oea.sh 18 > ./oea.000018.out 2>&1
    ./oea.sh 19 > ./oea.000019.out 2>&1
    ./oea.sh 20 > ./oea.000020.out 2>&1
    ./oea.sh 21 > ./oea.000021.out 2>&1
    ./oea.sh 22 > ./oea.000022.out 2>&1
    ./oea.sh 23 > ./oea.000023.out 2>&1
    ./oea.sh 24 > ./oea.000024.out 2>&1
    ./oea.sh 25 > ./oea.000025.out 2>&1
    ./oea.sh 26 > ./oea.000026.out 2>&1
    ./oea.sh 27 > ./oea.000027.out 2>&1
    ./oea.sh 28 > ./oea.000028.out 2>&1
    ./oea.sh 29 > ./oea.000029.out 2>&1
    ./oea.sh 30 > ./oea.000030.out 2>&1
    ./oea.sh 31 > ./oea.000031.out 2>&1
    ./oea.sh 32 > ./oea.000032.out 2>&1
    ./oea.sh 33 > ./oea.000033.out 2>&1
    ./oea.sh 34 > ./oea.000034.out 2>&1
    ./oea.sh 35 > ./oea.000035.out 2>&1
    ./oea.sh 36 > ./oea.000036.out 2>&1
    ./oea.sh 37 > ./oea.000037.out 2>&1
    ./oea.sh 38 > ./oea.000038.out 2>&1
    ./oea.sh 39 > ./oea.000039.out 2>&1
    ./oea.sh 40 > ./oea.000040.out 2>&1
    ./oea.sh 41 > ./oea.000041.out 2>&1
    ./oea.sh 42 > ./oea.000042.out 2>&1
    ./oea.sh 43 > ./oea.000043.out 2>&1
    ./oea.sh 44 > ./oea.000044.out 2>&1
    ./oea.sh 45 > ./oea.000045.out 2>&1
    ./oea.sh 46 > ./oea.000046.out 2>&1
    ./oea.sh 47 > ./oea.000047.out 2>&1
    ./oea.sh 48 > ./oea.000048.out 2>&1
    ./oea.sh 49 > ./oea.000049.out 2>&1

-- Finished on Wed Mar 28 21:23:32 2018 (5955 seconds) with 14531.423 GB free disk space
----------------------------------------
-- Found 49 overlap error adjustment output files.
----------------------------------------
-- Starting command on Wed Mar 28 21:23:32 2018 with 14531.423 GB free disk space

    cd unitigging/3-overlapErrorAdjustment
    /canu/Linux-amd64/bin/ovStoreBuild \
      -G ../HEcanuCorrLo.gkpStore \
      -O ../HEcanuCorrLo.ovlStore \
      -evalues \
      -L ./oea.files \
    > ./oea.apply.err 2>&1

-- Finished on Wed Mar 28 21:23:33 2018 (1 second) with 14531.423 GB free disk space
----------------------------------------
-- No report available.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'bat' concurrent execution on Wed Mar 28 21:23:33 2018 with 14531.423 GB free disk space (1 processes; 1 concurrently)

    cd unitigging/4-unitigger
    ./unitigger.sh 1 > ./unitigger.000001.out 2>&1

-- Finished on Wed Mar 28 21:24:37 2018 (64 seconds) with 14526.822 GB free disk space
----------------------------------------
--
-- Bogart failed, retry
--
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting 'bat' concurrent execution on Wed Mar 28 21:24:37 2018 with 14526.822 GB free disk space (1 processes; 1 concurrently)

    cd unitigging/4-unitigger
    ./unitigger.sh 1 > ./unitigger.000001.out 2>&1

-- Finished on Wed Mar 28 21:25:36 2018 (59 seconds) with 14537.416 GB free disk space
----------------------------------------
--
-- Bogart failed, tried 2 times, giving up.
--

ABORT:
ABORT: Canu snapshot v1.7 +23 changes (r8715 967fcea3c70699eaccc92ff5bfe36d9d10e65a55)
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
brianwalenz commented 6 years ago

Whoops. It's an easy fix, but DO NOT 'git update' your code. The on-disk data structures have changed since you started this assembly.

Instead, edit src/bogart/AS_BAT_BestOverlapGraph.C and delete line 416, the middle line ("writeLog(...)") below:

    if (fi < nc) {                             //  If we're smaller, we're a                                                                         
#pragma omp critical (suspInsert)              //  Zombie Master!                                                                                    
      writeLog("read %u is a zombie.\n", fi);
      _zombie.insert(fi);
    }

Recompile and then restart canu.

ovidp commented 6 years ago

Thank you for your fast answer. However canu fails again.

cat unitigging/4-unitigger/unitigger.err outputs now:

==> PARAMETERS.

Resources:
  Memory                189 GB
  Compute Threads       16 (command line)

Lengths:
  Minimum read          0 bases
  Minimum overlap       500 bases

Overlap Error Rates:
  Graph                 0.105 (10.500%)
  Max                   0.105 (10.500%)

Deviations:
  Graph                 6.000
  Bubble                6.000
  Repeat                3.000

Edge Confusion:
  Absolute              2100
  Percent               200.0000

Unitig Construction:
  Minimum intersection  500 bases
  Maxiumum placements   2 positions

Debugging Enabled:
  (none)

==> LOADING AND FILTERING OVERLAPS.

ReadInfo()-- Using 2138563 reads, no minimum read length used.

OverlapCache()-- limited to 193536MB memory (user supplied).

OverlapCache()--      16MB for read data.
OverlapCache()--      81MB for best edges.
OverlapCache()--     212MB for tigs.
OverlapCache()--      57MB for tigs - read layouts.
OverlapCache()--      81MB for tigs - error profiles.
OverlapCache()--   48384MB for tigs - error profile overlaps.
OverlapCache()--       0MB for other processes.
OverlapCache()-- ---------
OverlapCache()--   48873MB for data structures (sum of above).
OverlapCache()-- ---------
OverlapCache()--      40MB for overlap store structure.
OverlapCache()--  144621MB for overlap data.
OverlapCache()-- ---------
OverlapCache()--  193536MB allowed.
OverlapCache()--
OverlapCache()-- Retain at least 22 overlaps/read, based on 11.28x coverage.
OverlapCache()-- Initial guess at 4431 overlaps/read.
OverlapCache()--
OverlapCache()-- Adjusting for sparse overlaps.
OverlapCache()--
OverlapCache()--               reads loading olaps          olaps               memory
OverlapCache()--   olaps/read       all      some          loaded                 free
OverlapCache()--   ----------   -------   -------     ----------- -------     --------
OverlapCache()--         4431   2129462      9101       140283711  89.10%     142481 MB
OverlapCache()--      1030433   2138563         0       157439780 100.00%     142219 MB
OverlapCache()--
OverlapCache()-- Loading overlaps.
OverlapCache()--
OverlapCache()--          read from store           saved in cache
OverlapCache()--   ------------ ---------   ------------ ---------
OverlapCache()--       29347933 (018.64%)       28982507 (018.41%)
OverlapCache()--       58686018 (037.28%)       57964060 (036.82%)
OverlapCache()--       86967419 (055.24%)       85908451 (054.57%)
OverlapCache()--      114979131 (073.03%)      113584331 (072.14%)
OverlapCache()--      143507013 (091.15%)      141800857 (090.07%)
OverlapCache()--   ------------ ---------   ------------ ---------
OverlapCache()--      157439780 (100.00%)      155583954 (098.82%)
OverlapCache()--
OverlapCache()-- Ignored 1191848 duplicate overlaps.
OverlapCache()--
OverlapCache()-- Symmetrizing overlaps.
OverlapCache()--   Finding missing twins.
OverlapCache()--   Found 119302 missing twins in 155583954 overlaps, 1620 are strong.
OverlapCache()--   Dropping weak non-twin overlaps; allocated 0 MB scratch space.
OverlapCache()--   Dropped 3422 overlaps; scratch space released.
OverlapCache()--   Adding 115880 missing twin overlaps.
OverlapCache()--   Finished.

BestOverlapGraph()-- allocating best edges (65MB)

BestOverlapGraph()-- finding initial best edges.

BestOverlapGraph()-- filtering suspicious reads.
BestOverlapGraph()-- marked 1626591 reads as suspicious.

BestOverlapGraph()-- filtering high error edges.

BestOverlapGraph()-- filtering reads with lopsided best edges.

BestOverlapGraph()-- filtering spur reads.
BestOverlapGraph()-- detected 276777 spur reads and 1704959 singleton reads.
BestOverlapGraph()-- detected 68625 zombie reads.

BestOverlapGraph()-- removing best edges for contained reads.

==> BUILDING GREEDY TIGS.

breakSingletonTigs()-- Removed 298601 singleton tigs; reads are now unplaced.
optimizePositions()-- Optimizing read positions for 2138564 reads in 370120 tigs, with 16 threads.
optimizePositions()--   Allocating scratch space for 2138564 reads (133660 KB).
optimizePositions()--   Initializing positions with 16 threads.
optimizePositions()--   Recomputing positions, iteration 1, with 16 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 2135017 reads
optimizePositions()--     changed:     3547 reads
optimizePositions()--   Recomputing positions, iteration 2, with 16 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 2135354 reads
optimizePositions()--     changed:     3210 reads
optimizePositions()--   Recomputing positions, iteration 3, with 16 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 2135601 reads
optimizePositions()--     changed:     2963 reads
optimizePositions()--   Recomputing positions, iteration 4, with 16 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 2135695 reads
optimizePositions()--     changed:     2869 reads
optimizePositions()--   Recomputing positions, iteration 5, with 16 threads.
optimizePositions()--     Reset zero.
optimizePositions()--     Checking convergence.
optimizePositions()--     converged: 2135731 reads
optimizePositions()--     changed:     2833 reads
optimizePositions()--   Expanding short reads with 16 threads.
optimizePositions()--   Updating positions.
optimizePositions()--   Finished.

==> PLACE CONTAINED READS.

computeErrorProfiles()-- Computing error profiles for 370120 tigs, with 16 threads.
computeErrorProfiles()-- Finished.

placeContains()-- placing 119195 contained and 1943926 unplaced reads, with 16 threads.
placeContains()-- Placed 78262 contained reads and 98 unplaced reads.
placeContains()-- Failed to place 40933 contained reads (too high error suspected) and 1943828 unplaced reads (lack of overlaps suspected).
optimizePositions()-- Optimizing read positions for 2138564 reads in 370120 tigs, with 16 threads.
optimizePositions()--   Allocating scratch space for 2138564 reads (133660 KB).
optimizePositions()--   Initializing positions with 16 threads.
bogart: bogart/AS_BAT_OptimizePositions.C:142: void Unitig::optimize_initPlace(uint32, optPos*, optPos*, bool, std::set<unsigned int>&, bool): Assertion `cnt > 0' failed.

Failed with 'Aborted'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::97 in _Z17AS_UTL_catchCrashiP9siginfo_tPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
bogart/AS_BAT_OptimizePositions.C::142 in _ZN6Unitig18optimize_initPlaceEjP6optPosS1_bRSt3setIjSt4lessIjESaIjEEb()
bogart/AS_BAT_OptimizePositions.C::393 in _ZN9TigVector17optimizePositionsEPKcS1_._omp_fn.0()
../../../libgomp/team.c::116 in gomp_thread_start()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()

and the log file:

-- Canu snapshot v1.7 +23 changes (r8715 967fcea3c70699eaccc92ff5bfe36d9d10e65a55)
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
-- 
-- Read and contig alignments during correction, consensus and GFA building use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
--   Li H.
--   Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.
--   Bioinformatics. 2016 Jul 15;32(14):2103-10.
--   http://doi.org/10.1093/bioinformatics/btw152
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_92' (from '/net/gmi.oeaw.ac.at/software/mendel/intel-x86_64-sandybridge-avx/software/Java/1.8.0_92/bin/java').
-- Detected gnuplot version '4.6 patchlevel 0' (from 'gnuplot') and image format 'svg'.
-- Detected 48 CPUs and 220 gigabytes of memory.
-- No grid engine detected, grid disabled.
--
--                            (tag)Concurrency
--                     (tag)Threads          |
--            (tag)Memory         |          |
--        (tag)         |         |          |     total usage     algorithm
--        -------  ------  --------   --------  -----------------  -----------------------------
-- Local: meryl    220 GB   32 CPUs x   1 job    220 GB   32 CPUs  (k-mer counting)
-- Local: cormhap   32 GB   16 CPUs x   3 jobs    96 GB   48 CPUs  (overlap detection with mhap)
-- Local: obtovl    16 GB   16 CPUs x   3 jobs    48 GB   48 CPUs  (overlap detection)
-- Local: utgovl    16 GB   16 CPUs x   3 jobs    48 GB   48 CPUs  (overlap detection)
-- Local: ovb        4 GB    1 CPU  x  48 jobs   192 GB   48 CPUs  (overlap store bucketizer)
-- Local: ovs       32 GB    1 CPU  x   6 jobs   192 GB    6 CPUs  (overlap store sorting)
-- Local: red        8 GB    4 CPUs x  12 jobs    96 GB   48 CPUs  (read error detection)
-- Local: oea        4 GB    1 CPU  x  48 jobs   192 GB   48 CPUs  (overlap error adjustment)
-- Local: bat      220 GB   16 CPUs x   1 job    220 GB   16 CPUs  (contig construction)
-- Local: gfa       16 GB   16 CPUs x   1 job     16 GB   16 CPUs  (GFA alignment and processing)
--
-- In 'HEcanuCorrLo.gkpStore', found PacBio reads:
--   Raw:        0
--   Corrected:  2138563
--   Trimmed:    2138563
--
-- Generating assembly 'HEcanuCorrLo' in '/lustre/scratch/users/ovidiu.paun/PacBio'
--
-- Parameters:
--
--  genomeSize        1300000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.1050 ( 10.50%)
--    utgOvlErrorRate 0.1050 ( 10.50%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.1050 ( 10.50%)
--    utgErrorRate    0.1050 ( 10.50%)
--    cnsErrorRate    0.1050 ( 10.50%)
--
--
-- BEGIN ASSEMBLY
--
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'bat' concurrent execution on Thu Mar 29 15:24:31 2018 with 12708.892 GB free disk space (1 processes; 1 concurrently)

    cd unitigging/4-unitigger
    ./unitigger.sh 1 > ./unitigger.000001.out 2>&1

-- Finished on Thu Mar 29 15:25:27 2018 (56 seconds) with 12703.78 GB free disk space
----------------------------------------
--
-- Bogart failed, retry
--
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting 'bat' concurrent execution on Thu Mar 29 15:25:27 2018 with 12703.78 GB free disk space (1 processes; 1 concurrently)

    cd unitigging/4-unitigger
    ./unitigger.sh 1 > ./unitigger.000001.out 2>&1

-- Finished on Thu Mar 29 15:26:18 2018 (51 seconds) with 12696.698 GB free disk space
----------------------------------------
--
-- Bogart failed, tried 2 times, giving up.
--

ABORT:
ABORT: Canu snapshot v1.7 +23 changes (r8715 967fcea3c70699eaccc92ff5bfe36d9d10e65a55)
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

Thank you again for your help!

brianwalenz commented 6 years ago

That one is a bit harder, and I'm distressed it's still failing. I thought I fixed it (see #718 and #546 for other crashes).

Any chance you can upload unitigging/.gkpStore and unitigging/.ovlStore (unitigging/4-unitigger would be helpful, but not strictly necessary) so I can debug?

I also just noticed you've only got 11x of reads - is that 11x of raw uncorrected reads, or 11x of corrected reads? How were these corrected? Some hints are under 'low coverage' in http://canu.readthedocs.io/en/latest/faq.html.

Looking at the overlap report (search for "Overlap store 'unitigging/HEcanuCorrLo.ovlStore' contains") there isn't much of anything to assemble here. Worse, it's also reporting:

-- Overlap store 'unitigging/HEcanuCorrLo.ovlStore' successfully constructed.
-- Found 157439780 overlaps for 550391 reads; 1588172 reads have no overlaps.

so most of your reads aren't getting used at all. I think all you've got here are the repeats. :-(

I'd still be interested in debugging the crash, if you're able to upload the data.

ovidp commented 6 years ago

Thank you again for responding so fast. While I am uploading the files, I wanted to make sure you mean the .ovlStore and .gkpStore folders, right? I am currently uploading them to ftp://ftp.cbcb.umd.edu/incoming/sergek, but they are quite large files even in tar.gz state. Or did you mean the gkpStore.err instead?

To answer your other questions: Yes, I have only 11x coverage of the genome with PacBio, but also 120x Illumina reads. This assembly I am trying here is using lordec to correct the PacBio reads with Illumina, and then assembling them with canu. I am also trying separately to assemble with lordec-corrected PacBio reads, by declaring them as uncorrected. I know the assembly will not be great, but it will be used to apply for funds to get more data.

brianwalenz commented 6 years ago

If I'm understanding correctly, you have 11x of lordec corrected reads, and are running two assemblies with those reads, one using -pacbio-corrected (which crashed) and one pretending they're raw reads using -pacbio-raw. Great!

You can also try an assembly without trimming the lordec reads - "-assemble -pacbio-corrected reads.fasta". It could result in a better assembly as Canu will trim (or more likely, completely ignore) reads that have only overlaps on the ends.

Yes, the gkpStore (read info) and ovlStore (overlaps) are all I need to run the unitigger (bogart) over here. With that, I can poke around in the gory details and find the problem. I probably won't be able to do anything until Wednesday.

ovidp commented 6 years ago

Hi. The crashed assembly was actually started the way you suggest now as -assembly -pacbio-corrected lordec_corrected_reads.fasta. I uploaded the files. Thanks again

ovidp commented 6 years ago

Dear Brian

Can you please let me know what shall I do next regarding my segmentation fault problem? Shall I try to install a newer version of canu and rerun the entire analysis? Or are you still going to debug? Could you find the data I uploaded?

Thanks a lot

brianwalenz commented 6 years ago

I'm finishing up the fix right now.

The fix will be a pair of files in src/bogart/. Unfortunately, you can't easily upgrade to the latest version of Canu, since on-disk data changed. Once I give it a couple more tests I'll post the files here.

brianwalenz commented 6 years ago

It seems possible to upgrade your on-disk data to the current version.

Using your current binaries:

ovStoreDump -G HEcanuCorrLo.gkpStore -O HEcanuCorrLo.ovlStore -d -binary dump1
overlapConvert -G HEcanuCorrLo.gkpStore -raw dump1.ovb > dump1.raw

And then with the latest binaries:

overlapImport -G HEcanuCorrLo.gkpStore -raw -O new.ovlStore dump1.raw

This rewrites the overlaps from the old format into the new format. The output is a new ovlStore (creatively called 'new.ovlStore'). The gkpStore data format didn't change.

I was debugging a similar crash to yours, and thought I had the problem fixed. But your example still fails. There's no point in moving to the tip code yet; 'bogart' is still the same.

It might be possible to get around the problem by decreasing the allowed overlap error rate; decrease both the -tg and -eM values (by 0.1?) in unitigger.sh and run that script (./unitigger.sh 1).

Your data seems very noisy; the *.001.filterOverlaps.thr000.num000.log reports

ERROR RATES (658571 samples)
-----------
mean   0.08657178 stddev 0.01503193 -> 0.17676335 fraction error =  17.676335% error
median 0.08950000 mad    0.01020000 -> 0.18023512 fraction error =  18.023512% error

where the mean is usually around 0.01 or 0.02 and the final error is around 3% to 8%.

brianwalenz commented 6 years ago

Well, I got it to run. But it didn't assemble. Only 968 contigs with total size 15 Mbp were output. About 15 Gbp of 'unassembled' pieces, most of these are singleton reads.

No patch to the code yet; I'm working out other issues still.

Here's a histogram of the error rates in overlaps. It looks like it's maybe truncated at the high end. It's also much higher than I'm comfortable assembling - any genome duplication(s) cannot be distinguished, repeats will get smashed together, etc, etc.

hist

ovidp commented 6 years ago

Dear Brian Thank you very much for proceeding with this. I guess the idea of correcting PacBio reads with Illumina with Lordec and then assembling the long reads directly it is a bad one. I have other 2 assemblies running with canu, declaring the lordec corrected reads as raw PacBio reads and one starting directly with the raw reads (and not taking the illumina reads into ccount at all). Hope those assemblies will turn better. Not sure if the segmentation fault was introduced in any way by lordec. Anyway, thank you very much again. Best wishes, Ovidiu

brianwalenz commented 6 years ago

Thanks for sharing the data!

The algorithm that fails seems to be getting confused by repetitiveness of this data. This could be caused by lordec homogenizing repeats, or the high divergence in overlaps, or it could just be a property of your genome. I thought I had a fix, but am now back to rethinking the whole algorithm.

brianwalenz commented 6 years ago

That was ugly, but I think I (finally) got it fixed.

Your data has been removed. It was on a disk that isn't backed up.