marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
658 stars 179 forks source link

Pipeline failed at the generateCorrectionLayout stage #539

Closed FadyMohareb closed 7 years ago

FadyMohareb commented 7 years ago

Hello there, I managed to get canu working up until the generateCorrectionLayout stage, then got a failure at the generateCorrectionLayout command, Tried to re-run several times, but with no success. I am using the precompiled version 1.5 on a single node server. using the following command:

canu -p chilense_RSII_Sequel_Merged_fastq -d /home/fady/projects/chilbix/chilense/assembly/canu/canu_mergeRSIIandSequel_using_fastq/ -pacbio-raw ~/projects/chilbix/chilense/longReads/merge/PacBio_Sequel_RSII_Merged.fastq genomeSize=830m maxThreads=24 maxMemory=768


-- Detected Java(TM) Runtime Environment '1.8.0_131' (from '/usr/lib/jvm/java-8-oracle/bin/java').
-- Detected 48 CPUs and 1008 gigabytes of memory.
-- No grid engine detected, grid disabled.
--
-- Allowed to run   2 jobs concurrently, and use up to  12 compute threads and  256 GB memory for stage 'bogart (unitigger)'.
-- Allowed to run   2 jobs concurrently, and use up to  12 compute threads and   32 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run   2 jobs concurrently, and use up to  12 compute threads and   32 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run   2 jobs concurrently, and use up to  12 compute threads and   32 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run   3 jobs concurrently, and use up to   8 compute threads and    8 GB memory for stage 'read error detection (overlap error adjustment)'.
-- Allowed to run  24 jobs concurrently, and use up to   1 compute thread  and    2 GB memory for stage 'overlap error adjustment'.
-- Allowed to run   3 jobs concurrently, and use up to   8 compute threads and   48 GB memory for stage 'utgcns (consensus'.
-- Allowed to run  24 jobs concurrently, and use up to   1 compute thread  and    4 GB memory for stage 'overlap store parallel bucketizer'.
-- Allowed to run  24 jobs concurrently, and use up to   1 compute thread  and   16 GB memory for stage 'overlap store parallel sorting'.
-- Allowed to run  24 jobs concurrently, and use up to   1 compute thread  and    8 GB memory for stage 'overlapper'.
-- Allowed to run   3 jobs concurrently, and use up to   8 compute threads and   12 GB memory for stage 'overlapper'.
-- Allowed to run   3 jobs concurrently, and use up to   8 compute threads and   12 GB memory for stage 'overlapper'.
-- Allowed to run   2 jobs concurrently, and use up to  12 compute threads and   64 GB memory for stage 'meryl (k-mer counting)'.
-- Allowed to run   6 jobs concurrently, and use up to   4 compute threads and   20 GB memory for stage 'falcon_sense (read correction)'.
-- Allowed to run   2 jobs concurrently, and use up to  12 compute threads and   32 GB memory for stage 'minimap (overlapper)'.
-- Allowed to run   2 jobs concurrently, and use up to  12 compute threads and   32 GB memory for stage 'minimap (overlapper)'.
-- Allowed to run   2 jobs concurrently, and use up to  12 compute threads and   32 GB memory for stage 'minimap (overlapper)'.
--
-- This is canu parallel iteration #1, out of a maximum of 2 attempts.
--
-- Final error rates before starting pipeline:
--   
--   genomeSize          -- 830000000
--   errorRate           -- 0.025
--   
--   corOvlErrorRate     -- 0.075
--   obtOvlErrorRate     -- 0.075
--   utgOvlErrorRate     -- 0.075
--   
--   obtErrorRate        -- 0.075
--   
--   cnsErrorRate        -- 0.075
--
--
-- BEGIN CORRECTION
--
----------------------------------------
-- Starting command on Thu Jun 29 21:38:56 2017 with 770.3 GB free disk space

    /opt/bix/canu/1.3/Linux-amd64/bin/generateCorrectionLayouts \
      -G /home/fady/projects/chilbix/chilense/assembly/canu/canu_mergeRSIIandSequel_using_fastq//correction/chilense_RSII_Sequel_Merged_fastq.gkpStore \
      -O /home/fady/projects/chilbix/chilense/assembly/canu/canu_mergeRSIIandSequel_using_fastq//correction/chilense_RSII_Sequel_Merged_fastq.ovlStore \
      -S /home/fady/projects/chilbix/chilense/assembly/canu/canu_mergeRSIIandSequel_using_fastq//correction/2-correction/chilense_RSII_Sequel_Merged_fastq.globalScores \
      -C 80 \
      -p /home/fady/projects/chilbix/chilense/assembly/canu/canu_mergeRSIIandSequel_using_fastq//correction/2-correction/chilense_RSII_Sequel_Merged_fastq.estimate
ERROR: bogus overlap '         9    4122443  I       0     393    881  4294966741   2909  0.145200'
generateCorrectionLayouts: correction/generateCorrectionLayouts.C:105: tgTig* generateLayout(gkStore*, uint64*, bool, uint32, double, double, ovOverlap*, uint32, FILE*): Assertion `ovlLength < (((uint32)1 << 21) - 1)' failed.

Failed with 'Aborted'

Backtrace (mangled):

/opt/bix/canu/1.3/Linux-amd64/bin/generateCorrectionLayouts[0x42bc2d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7fc78e46a330]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7fc78e0c8c37]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7fc78e0cc028]
/lib/x86_64-linux-gnu/libc.so.6(+0x2fbf6)[0x7fc78e0c1bf6]
/lib/x86_64-linux-gnu/libc.so.6(+0x2fca2)[0x7fc78e0c1ca2]
/opt/bix/canu/1.3/Linux-amd64/bin/generateCorrectionLayouts[0x403e3e]
/opt/bix/canu/1.3/Linux-amd64/bin/generateCorrectionLayouts[0x402b01]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fc78e0b3f45]
/opt/bix/canu/1.3/Linux-amd64/bin/generateCorrectionLayouts[0x403889]

Backtrace (demangled):

[0] /opt/bix/canu/1.3/Linux-amd64/bin/generateCorrectionLayouts() [0x42bc2d]
[1] /lib/x86_64-linux-gnu/libpthread.so.0::(null) + 0x10330  [0x7fc78e46a330]
[2] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x37  [0x7fc78e0c8c37]
[3] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x148  [0x7fc78e0cc028]
[4] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x2fbf6  [0x7fc78e0c1bf6]
[5] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0x2fca2  [0x7fc78e0c1ca2]
[6] /opt/bix/canu/1.3/Linux-amd64/bin/generateCorrectionLayouts() [0x403e3e]
[7] /opt/bix/canu/1.3/Linux-amd64/bin/generateCorrectionLayouts() [0x402b01]
[8] /lib/x86_64-linux-gnu/libc.so.6::(null) + 0xf5  [0x7fc78e0b3f45]
[9] /opt/bix/canu/1.3/Linux-amd64/bin/generateCorrectionLayouts() [0x403889]

GDB:

Aborted (core dumped)

-- Finished on Thu Jun 29 21:39:12 2017 (16 seconds) with 770.3 GB free disk space
----------------------------------------
ERROR:
ERROR:  Failed with exit code 134.  (rc=34304)
ERROR:
================================================================================
Don't panic, but a mostly harmless error occurred and canu failed.

Disk space available:  770.3 GB

canu failed with 'failed to generate estimated lengths of corrected reads'."

Inside the dodgy overlap: grep "4122443" chilense_RSII_Sequel_Merged_fastq.globalScores.log --> 4122443 - 1825 overlaps - 2 scored - 0 filtered - 2 saved (no filtering)

this "bogus overlap gives me the output below (quite long to paste here though!) ovStoreDump -G ./correction/chilense_RSII_Sequel_Merged_fastq.gkpStore -O ./correction/chilense_RSII_Sequel_Merged_fastq.ovlStore -d 4122443 | head -n 20


   4122443          9  I       0    2909 4294966741     881    393  0.145200
   4122443       2202  N       0    2369    788   20697  21063  0.138000
   4122443       2309  N       0    2918    395    8367  10987  0.101400
   4122443       2826  N       0    2894    613    1704   5001  0.147400
   4122443       8082  N       0    2856    643   18442  22291  0.136300
   4122443      12643  N       0    2887    642    3900   7322  0.158200
   4122443      14666  I       0    2825    306    4482   3995  0.145200
   4122443      16155  N       0    2965    348    1587   3251  0.077800
   4122443      20177  N       0    2883 4294966898    2789   3120  0.030100
   4122443      20188  N       0    2941    412    5332   8450  0.066200
   4122443      20226  N       0    2869    640    5388   8800  0.140300
   4122443      20555  N       0    2880    475   18195  22707  0.104300
   4122443      20732  N       0    2824    489    8663  12072  0.142200
   4122443      20884  N       0    2956    321    1716   3821  0.052100
   4122443      22508  I       0    2906    416   13529  11942  0.040700
   4122443      23894  I       0    5361   1731    1449    966  0.128600
   4122443      25557  N       0    2860    453   11962  13640  0.100700
   4122443      28566  I       0    2856 4294967060    3624   3331  0.060300
   4122443      28718  N       0    2837    533    5267   8832  0.110400
   4122443      29644  N       0    2894 4294967047    8083   8773  0.131900

Any thoughts?!

skoren commented 7 years ago

From the logs you're using 1.3 not 1.5, perhaps there is another installation of Canu that your command is picking up instead of 1.5. There was a bug causing Canu to not detect errors in a previous overlapping step leading to erroneous overlaps and the error you are seeing. I would suggest restarting from scratch and making sure you run 1.5 not 1.3 Canu.

FadyMohareb commented 7 years ago

Thanks Skoren - You are 100% right! I indeed had 1.3 also installed and it seems some of the previous steps were done using 1.5 but maybe during one of the re-runs half way the my modules manager decided to revert to 1.3 - rerunning now from scratch

FadyMohareb commented 7 years ago

Okay..2nd attempt is currently stuck on the mhap stage: After meryl and precompute, it started with 147 mhap processes, first round with 108 failed. Upon re-running several rounds, I managed to get to only 47 failed processes, then it seems to get stuck at 47 failed processes. I deleted the whole output and started fresh several times, but it always fails at map for a certain no. of processes (the final number is not always 47 though!). Any clues as to why that is??

Command and first round of output below:

> /opt/bix/canu/1.5/Linux-amd64/bin/canu -p chilense_RSII_Sequel_Merged_fastq -d /home/fady/projects/chilbix/chilense/assembly/canu/canu_mergeRSIIandSequel_using_fastq/ -pacbio-raw ~/projects/chilbix/chilense/longReads/merge/PacBio_Sequel_RSII_Merged.f1000.fasta  genomeSize=830m maxThreads=42
-- Canu release v1.5
-- Detected Java(TM) Runtime Environment '1.8.0_131' (from '/usr/lib/jvm/java-8-oracle/bin/java').
-- Detected gnuplot version '4.6 patchlevel 4' (from 'gnuplot') and image format 'png'.
-- Detected 48 CPUs and 1008 gigabytes of memory.
-- Limited to 42 CPUs from maxThreads option.
-- No grid engine detected, grid disabled.
--
-- Run   3 jobs concurrently using   64 GB and  14 CPUs for stage 'meryl'.
-- Run   3 jobs concurrently using   32 GB and  14 CPUs for stage 'mhap (cor)'.
-- Run   6 jobs concurrently using   12 GB and   7 CPUs for stage 'overlapper (obt)'.
-- Run   6 jobs concurrently using   12 GB and   7 CPUs for stage 'overlapper (utg)'.
-- Run  14 jobs concurrently using   20 GB and   3 CPUs for stage 'falcon_sense'.
-- Run  42 jobs concurrently using    4 GB and   1 CPU  for stage 'ovStore bucketizer'.
-- Run  42 jobs concurrently using   16 GB and   1 CPU  for stage 'ovStore sorting'.
-- Run   6 jobs concurrently using    8 GB and   7 CPUs for stage 'read error detection'.
-- Run  42 jobs concurrently using    2 GB and   1 CPU  for stage 'overlap error adjustment'.
-- Run   3 jobs concurrently using  256 GB and  14 CPUs for stage 'bogart'.
-- Run   6 jobs concurrently using    8 GB and   7 CPUs for stage 'GFA alignment and processing'.
-- Run   6 jobs concurrently using   48 GB and   7 CPUs for stage 'consensus'.
--
-- Generating assembly 'chilense_RSII_Sequel_Merged_fastq' in '/home/fady/projects/chilbix/chilense/assembly/canu/canu_mergeRSIIandSequel_using_fastq'
--
-- Parameters:
--
--  genomeSize        830000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.0450 (  4.50%)
--    utgOvlErrorRate 0.0450 (  4.50%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.0450 (  4.50%)
--    utgErrorRate    0.0450 (  4.50%)
--    cnsErrorRate    0.0450 (  4.50%)
--
skoren commented 7 years ago

Your log doesn't show the final error, is it failing in the precompute stage or the mhap stage? One of the failed jobs should have the specified error in their output file (precompute.*.out or mhap.*.out depending on the stage) that should have more information on why it is failing.

FadyMohareb commented 7 years ago

At the mhap stage (Remaining output below)

--
-- BEGIN CORRECTION
--
----------------------------------------
-- Starting command on Sun Jul  2 19:44:20 2017 with 1199.129 GB free disk space

    cd correction
    /opt/bix/canu/1.5/Linux-amd64/bin/gatekeeperCreate \
      -minlength 1000 \
      -o ./chilense_RSII_Sequel_Merged_fastq.gkpStore.BUILDING \
      ./chilense_RSII_Sequel_Merged_fastq.gkpStore.gkp \
    > ./chilense_RSII_Sequel_Merged_fastq.gkpStore.BUILDING.err 2>&1

-- Finished on Sun Jul  2 19:59:52 2017 (932 seconds) with 1186.454 GB free disk space
----------------------------------------
--
-- In gatekeeper store 'correction/chilense_RSII_Sequel_Merged_fastq.gkpStore':
--   Found 5666122 reads.
--   Found 49987956975 bases (60.22 times coverage).
--
--   Read length histogram (one '*' equals 28774.6 reads):
--        0   4999 2014222 **********************************************************************
--     5000   9999 1829657 ***************************************************************
--    10000  14999 942733 ********************************
--    15000  19999 419478 **************
--    20000  24999 254123 ********
--    25000  29999 123406 ****
--    30000  34999  49769 *
--    35000  39999  19694 
--    40000  44999   7724 
--    45000  49999   3003 
--    50000  54999   1202 
--    55000  59999    543 
--    60000  64999    242 
--    65000  69999    121 
--    70000  74999     71 
--    75000  79999     42 
--    80000  84999     22 
--    85000  89999     20 
--    90000  94999     11 
--    95000  99999     12 
--   100000 104999      9 
--   105000 109999      9 
--   110000 114999      3 
--   115000 119999      2 
--   120000 124999      0 
--   125000 129999      0 
--   130000 134999      0 
--   135000 139999      0 
--   140000 144999      1 
--   145000 149999      0 
--   150000 154999      1 
--   155000 159999      1 
--   160000 164999      1 
-- Finished stage 'cor-gatekeeper', reset canuIteration.
-- Finished stage 'merylConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting concurrent execution on Sun Jul  2 20:00:47 2017 with 1186.276 GB free disk space (1 processes; 3 concurrently)

    cd correction/0-mercounts
    ./meryl.sh 1 > ./meryl.000001.out 2>&1

-- Finished on Sun Jul  2 20:54:42 2017 (3235 seconds) with 1176.52 GB free disk space
----------------------------------------
-- Meryl finished successfully.
-- Finished stage 'merylCheck', reset canuIteration.
--
--  16-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1  95925368 ******************                                                     0.0455 0.0019
--       2-     2 137210259 **************************                                             0.1106 0.0074
--       3-     4 303253324 *********************************************************              0.1831 0.0166
--       5-     7 368889724 ********************************************************************** 0.3198 0.0425
--       8-    11 316571340 ************************************************************           0.4742 0.0876
--      12-    16 238759677 *********************************************                          0.6071 0.1457
--      17-    22 171982051 ********************************                                       0.7094 0.2097
--      23-    29 122026160 ***********************                                                0.7844 0.2739
--      30-    37  87455852 ****************                                                       0.8384 0.3349
--      38-    46  63693734 ************                                                           0.8776 0.3916
--      47-    56  46784402 ********                                                               0.9064 0.4436
--      57-    67  34561150 ******                                                                 0.9278 0.4906
--      68-    79  25681756 ****                                                                   0.9436 0.5325
--      80-    92  19355582 ***                                                                    0.9554 0.5695
--      93-   106  14793633 **                                                                     0.9643 0.6022
--     107-   121  11406997 **                                                                     0.9712 0.6311
--     122-   137   8895933 *                                                                      0.9765 0.6568
--     138-   154   6984505 *                                                                      0.9806 0.6795
--     155-   172   5534157 *                                                                      0.9839 0.6997
--     173-   191   4425287                                                                        0.9864 0.7176
--     192-   211   3577541                                                                        0.9885 0.7336
--     212-   232   2929064                                                                        0.9902 0.7479
--     233-   254   2422507                                                                        0.9916 0.7608
--     255-   277   2030433                                                                        0.9927 0.7725
--     278-   301   1705730                                                                        0.9936 0.7833
--     302-   326   1461808                                                                        0.9944 0.7931
--     327-   352   1264356                                                                        0.9951 0.8022
--     353-   379   1064674                                                                        0.9957 0.8108
--     380-   407    897176                                                                        0.9962 0.8185
--     408-   436    777521                                                                        0.9967 0.8256
--     437-   466    660901                                                                        0.9970 0.8321
--     467-   497    551577                                                                        0.9973 0.8381
--     498-   529    465122                                                                        0.9976 0.8434
--     530-   562    401227                                                                        0.9978 0.8481
--     563-   596    354054                                                                        0.9980 0.8525
--     597-   631    316444                                                                        0.9982 0.8566
--     632-   667    289958                                                                        0.9983 0.8605
--     668-   704    263795                                                                        0.9985 0.8642
--     705-   742    231135                                                                        0.9986 0.8679
--     743-   781    202395                                                                        0.9987 0.8712
--     782-   821    181856                                                                        0.9988 0.8743
--
--    19326675 (max occurrences)
-- 49807039777 (total mers, non-unique)
--  2012705596 (distinct mers, non-unique)
--    95925368 (unique mers)
-- For mhap overlapping, set repeat k-mer threshold to 499029.
--
-- Found 49902965145 16-mers; 2108630964 distinct and 95925368 unique.  Largest count 19326675.
-- Finished stage 'cor-meryl', reset canuIteration.
--
-- OVERLAPPER (mhap) (correction)
--
-- Set corMhapSensitivity=low based on read coverage of 60.
--
-- PARAMETERS: hashes=256, minMatches=3, threshold=0.8
--
-- Given 32 GB, can fit 96000 reads per block.
-- For 61 blocks, set stride to 15 blocks.
-- Logging partitioning to 'correction/1-overlapper/partitioning.log'.
-- Configured 60 mhap precompute jobs.
-- Configured 147 mhap overlap jobs.
-- Finished stage 'cor-mhapConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting concurrent execution on Sun Jul  2 20:56:53 2017 with 1186.27 GB free disk space (60 processes; 3 concurrently)

    cd correction/1-overlapper
    ./precompute.sh 1 > ./precompute.000001.out 2>&1
    ./precompute.sh 2 > ./precompute.000002.out 2>&1
    ./precompute.sh 3 > ./precompute.000003.out 2>&1
    ./precompute.sh 4 > ./precompute.000004.out 2>&1
    ./precompute.sh 5 > ./precompute.000005.out 2>&1
    ./precompute.sh 6 > ./precompute.000006.out 2>&1
    ./precompute.sh 7 > ./precompute.000007.out 2>&1
    ./precompute.sh 8 > ./precompute.000008.out 2>&1
    ./precompute.sh 9 > ./precompute.000009.out 2>&1
    ./precompute.sh 10 > ./precompute.000010.out 2>&1
    ./precompute.sh 11 > ./precompute.000011.out 2>&1
    ./precompute.sh 12 > ./precompute.000012.out 2>&1
    ./precompute.sh 13 > ./precompute.000013.out 2>&1
    ./precompute.sh 14 > ./precompute.000014.out 2>&1
    ./precompute.sh 15 > ./precompute.000015.out 2>&1
    ./precompute.sh 16 > ./precompute.000016.out 2>&1
    ./precompute.sh 17 > ./precompute.000017.out 2>&1
    ./precompute.sh 18 > ./precompute.000018.out 2>&1
    ./precompute.sh 19 > ./precompute.000019.out 2>&1
    ./precompute.sh 20 > ./precompute.000020.out 2>&1
    ./precompute.sh 21 > ./precompute.000021.out 2>&1
    ./precompute.sh 22 > ./precompute.000022.out 2>&1
    ./precompute.sh 23 > ./precompute.000023.out 2>&1
    ./precompute.sh 24 > ./precompute.000024.out 2>&1
    ./precompute.sh 25 > ./precompute.000025.out 2>&1
    ./precompute.sh 26 > ./precompute.000026.out 2>&1
    ./precompute.sh 27 > ./precompute.000027.out 2>&1
    ./precompute.sh 28 > ./precompute.000028.out 2>&1
    ./precompute.sh 29 > ./precompute.000029.out 2>&1
    ./precompute.sh 30 > ./precompute.000030.out 2>&1
    ./precompute.sh 31 > ./precompute.000031.out 2>&1
    ./precompute.sh 32 > ./precompute.000032.out 2>&1
    ./precompute.sh 33 > ./precompute.000033.out 2>&1
    ./precompute.sh 34 > ./precompute.000034.out 2>&1
    ./precompute.sh 35 > ./precompute.000035.out 2>&1
    ./precompute.sh 36 > ./precompute.000036.out 2>&1
    ./precompute.sh 37 > ./precompute.000037.out 2>&1
    ./precompute.sh 38 > ./precompute.000038.out 2>&1
    ./precompute.sh 39 > ./precompute.000039.out 2>&1
    ./precompute.sh 40 > ./precompute.000040.out 2>&1
    ./precompute.sh 41 > ./precompute.000041.out 2>&1
    ./precompute.sh 42 > ./precompute.000042.out 2>&1
    ./precompute.sh 43 > ./precompute.000043.out 2>&1
    ./precompute.sh 44 > ./precompute.000044.out 2>&1
    ./precompute.sh 45 > ./precompute.000045.out 2>&1
    ./precompute.sh 46 > ./precompute.000046.out 2>&1
    ./precompute.sh 47 > ./precompute.000047.out 2>&1
    ./precompute.sh 48 > ./precompute.000048.out 2>&1
    ./precompute.sh 49 > ./precompute.000049.out 2>&1
    ./precompute.sh 50 > ./precompute.000050.out 2>&1
    ./precompute.sh 51 > ./precompute.000051.out 2>&1
    ./precompute.sh 52 > ./precompute.000052.out 2>&1
    ./precompute.sh 53 > ./precompute.000053.out 2>&1
    ./precompute.sh 54 > ./precompute.000054.out 2>&1
    ./precompute.sh 55 > ./precompute.000055.out 2>&1
    ./precompute.sh 56 > ./precompute.000056.out 2>&1
    ./precompute.sh 57 > ./precompute.000057.out 2>&1
    ./precompute.sh 58 > ./precompute.000058.out 2>&1
    ./precompute.sh 59 > ./precompute.000059.out 2>&1
    ./precompute.sh 60 > ./precompute.000060.out 2>&1

-- Finished on Mon Jul  3 00:12:40 2017 (11747 seconds) with 1117.547 GB free disk space
----------------------------------------
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting concurrent execution on Mon Jul  3 00:12:40 2017 with 1117.547 GB free disk space (24 processes; 3 concurrently)

    cd correction/1-overlapper
    ./precompute.sh 6 > ./precompute.000006.out 2>&1
    ./precompute.sh 11 > ./precompute.000011.out 2>&1
    ./precompute.sh 18 > ./precompute.000018.out 2>&1
    ./precompute.sh 19 > ./precompute.000019.out 2>&1
    ./precompute.sh 20 > ./precompute.000020.out 2>&1
    ./precompute.sh 21 > ./precompute.000021.out 2>&1
    ./precompute.sh 22 > ./precompute.000022.out 2>&1
    ./precompute.sh 29 > ./precompute.000029.out 2>&1
    ./precompute.sh 30 > ./precompute.000030.out 2>&1
    ./precompute.sh 31 > ./precompute.000031.out 2>&1
    ./precompute.sh 32 > ./precompute.000032.out 2>&1
    ./precompute.sh 33 > ./precompute.000033.out 2>&1
    ./precompute.sh 34 > ./precompute.000034.out 2>&1
    ./precompute.sh 35 > ./precompute.000035.out 2>&1
    ./precompute.sh 36 > ./precompute.000036.out 2>&1
    ./precompute.sh 37 > ./precompute.000037.out 2>&1
    ./precompute.sh 51 > ./precompute.000051.out 2>&1
    ./precompute.sh 52 > ./precompute.000052.out 2>&1
    ./precompute.sh 53 > ./precompute.000053.out 2>&1
    ./precompute.sh 54 > ./precompute.000054.out 2>&1
    ./precompute.sh 55 > ./precompute.000055.out 2>&1
    ./precompute.sh 56 > ./precompute.000056.out 2>&1
    ./precompute.sh 57 > ./precompute.000057.out 2>&1
    ./precompute.sh 58 > ./precompute.000058.out 2>&1

-- Finished on Mon Jul  3 02:20:14 2017 (7654 seconds) with 1095.531 GB free disk space
----------------------------------------
-- All 60 mhap precompute jobs finished successfully.
-- Finished stage 'cor-mhapPrecomputeCheck', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting concurrent execution on Mon Jul  3 02:20:14 2017 with 1095.531 GB free disk space (147 processes; 3 concurrently)

    cd correction/1-overlapper
    ./mhap.sh 1 > ./mhap.000001.out 2>&1
    ./mhap.sh 2 > ./mhap.000002.out 2>&1
    ./mhap.sh 3 > ./mhap.000003.out 2>&1
    ./mhap.sh 4 > ./mhap.000004.out 2>&1
    ./mhap.sh 5 > ./mhap.000005.out 2>&1
    ./mhap.sh 6 > ./mhap.000006.out 2>&1
    ./mhap.sh 7 > ./mhap.000007.out 2>&1
    ./mhap.sh 8 > ./mhap.000008.out 2>&1
    ./mhap.sh 9 > ./mhap.000009.out 2>&1
    ./mhap.sh 10 > ./mhap.000010.out 2>&1
    ./mhap.sh 11 > ./mhap.000011.out 2>&1
    ./mhap.sh 12 > ./mhap.000012.out 2>&1
    ./mhap.sh 13 > ./mhap.000013.out 2>&1
    ./mhap.sh 14 > ./mhap.000014.out 2>&1
    ./mhap.sh 15 > ./mhap.000015.out 2>&1
    ./mhap.sh 16 > ./mhap.000016.out 2>&1
    ./mhap.sh 17 > ./mhap.000017.out 2>&1
    ./mhap.sh 18 > ./mhap.000018.out 2>&1
    ./mhap.sh 19 > ./mhap.000019.out 2>&1
    ./mhap.sh 20 > ./mhap.000020.out 2>&1
    ./mhap.sh 21 > ./mhap.000021.out 2>&1
    ./mhap.sh 22 > ./mhap.000022.out 2>&1
    ./mhap.sh 23 > ./mhap.000023.out 2>&1
    ./mhap.sh 24 > ./mhap.000024.out 2>&1
    ./mhap.sh 25 > ./mhap.000025.out 2>&1
    ./mhap.sh 26 > ./mhap.000026.out 2>&1
    ./mhap.sh 27 > ./mhap.000027.out 2>&1
    ./mhap.sh 28 > ./mhap.000028.out 2>&1
    ./mhap.sh 29 > ./mhap.000029.out 2>&1
    ./mhap.sh 30 > ./mhap.000030.out 2>&1
    ./mhap.sh 31 > ./mhap.000031.out 2>&1
    ./mhap.sh 32 > ./mhap.000032.out 2>&1
    ./mhap.sh 33 > ./mhap.000033.out 2>&1
    ./mhap.sh 34 > ./mhap.000034.out 2>&1
    ./mhap.sh 35 > ./mhap.000035.out 2>&1
    ./mhap.sh 36 > ./mhap.000036.out 2>&1
    ./mhap.sh 37 > ./mhap.000037.out 2>&1
    ./mhap.sh 38 > ./mhap.000038.out 2>&1
    ./mhap.sh 39 > ./mhap.000039.out 2>&1
    ./mhap.sh 40 > ./mhap.000040.out 2>&1
    ./mhap.sh 41 > ./mhap.000041.out 2>&1
    ./mhap.sh 42 > ./mhap.000042.out 2>&1
    ./mhap.sh 43 > ./mhap.000043.out 2>&1
    ./mhap.sh 44 > ./mhap.000044.out 2>&1
    ./mhap.sh 45 > ./mhap.000045.out 2>&1
    ./mhap.sh 46 > ./mhap.000046.out 2>&1
    ./mhap.sh 47 > ./mhap.000047.out 2>&1
    ./mhap.sh 48 > ./mhap.000048.out 2>&1
    ./mhap.sh 49 > ./mhap.000049.out 2>&1
    ./mhap.sh 50 > ./mhap.000050.out 2>&1
    ./mhap.sh 51 > ./mhap.000051.out 2>&1
    ./mhap.sh 52 > ./mhap.000052.out 2>&1
    ./mhap.sh 53 > ./mhap.000053.out 2>&1
    ./mhap.sh 54 > ./mhap.000054.out 2>&1
    ./mhap.sh 55 > ./mhap.000055.out 2>&1
    ./mhap.sh 56 > ./mhap.000056.out 2>&1
    ./mhap.sh 57 > ./mhap.000057.out 2>&1
    ./mhap.sh 58 > ./mhap.000058.out 2>&1
    ./mhap.sh 59 > ./mhap.000059.out 2>&1
    ./mhap.sh 60 > ./mhap.000060.out 2>&1
    ./mhap.sh 61 > ./mhap.000061.out 2>&1
    ./mhap.sh 62 > ./mhap.000062.out 2>&1
    ./mhap.sh 63 > ./mhap.000063.out 2>&1
    ./mhap.sh 64 > ./mhap.000064.out 2>&1
    ./mhap.sh 65 > ./mhap.000065.out 2>&1
    ./mhap.sh 66 > ./mhap.000066.out 2>&1
    ./mhap.sh 67 > ./mhap.000067.out 2>&1
    ./mhap.sh 68 > ./mhap.000068.out 2>&1
    ./mhap.sh 69 > ./mhap.000069.out 2>&1
    ./mhap.sh 70 > ./mhap.000070.out 2>&1
    ./mhap.sh 71 > ./mhap.000071.out 2>&1
    ./mhap.sh 72 > ./mhap.000072.out 2>&1
    ./mhap.sh 73 > ./mhap.000073.out 2>&1
    ./mhap.sh 74 > ./mhap.000074.out 2>&1
    ./mhap.sh 75 > ./mhap.000075.out 2>&1
    ./mhap.sh 76 > ./mhap.000076.out 2>&1
    ./mhap.sh 77 > ./mhap.000077.out 2>&1
    ./mhap.sh 78 > ./mhap.000078.out 2>&1
    ./mhap.sh 79 > ./mhap.000079.out 2>&1
    ./mhap.sh 80 > ./mhap.000080.out 2>&1
    ./mhap.sh 81 > ./mhap.000081.out 2>&1
    ./mhap.sh 82 > ./mhap.000082.out 2>&1
    ./mhap.sh 83 > ./mhap.000083.out 2>&1
    ./mhap.sh 84 > ./mhap.000084.out 2>&1
    ./mhap.sh 85 > ./mhap.000085.out 2>&1
    ./mhap.sh 86 > ./mhap.000086.out 2>&1
        ./mhap.sh 87 > ./mhap.000087.out 2>&1
    ./mhap.sh 88 > ./mhap.000088.out 2>&1
    ./mhap.sh 89 > ./mhap.000089.out 2>&1
    ./mhap.sh 90 > ./mhap.000090.out 2>&1
    ./mhap.sh 91 > ./mhap.000091.out 2>&1
    ./mhap.sh 92 > ./mhap.000092.out 2>&1
    ./mhap.sh 93 > ./mhap.000093.out 2>&1
    ./mhap.sh 94 > ./mhap.000094.out 2>&1
    ./mhap.sh 95 > ./mhap.000095.out 2>&1
    ./mhap.sh 96 > ./mhap.000096.out 2>&1
    ./mhap.sh 97 > ./mhap.000097.out 2>&1
    ./mhap.sh 96 > ./mhap.000096.out 2>&1
    ./mhap.sh 97 > ./mhap.000097.out 2>&1
    ./mhap.sh 98 > ./mhap.000098.out 2>&1
    ./mhap.sh 99 > ./mhap.000099.out 2>&1
    ./mhap.sh 100 > ./mhap.000100.out 2>&1
    ./mhap.sh 101 > ./mhap.000101.out 2>&1
    ./mhap.sh 102 > ./mhap.000102.out 2>&1
    ./mhap.sh 103 > ./mhap.000103.out 2>&1
    ./mhap.sh 104 > ./mhap.000104.out 2>&1
    ./mhap.sh 105 > ./mhap.000105.out 2>&1
    ./mhap.sh 106 > ./mhap.000106.out 2>&1
    ./mhap.sh 107 > ./mhap.000107.out 2>&1
    ./mhap.sh 108 > ./mhap.000108.out 2>&1
    ./mhap.sh 109 > ./mhap.000109.out 2>&1
    ./mhap.sh 110 > ./mhap.000110.out 2>&1
    ./mhap.sh 111 > ./mhap.000111.out 2>&1
    ./mhap.sh 112 > ./mhap.000112.out 2>&1
    ./mhap.sh 113 > ./mhap.000113.out 2>&1
    ./mhap.sh 114 > ./mhap.000114.out 2>&1
    ./mhap.sh 115 > ./mhap.000115.out 2>&1
    ./mhap.sh 116 > ./mhap.000116.out 2>&1
    ./mhap.sh 117 > ./mhap.000117.out 2>&1
    ./mhap.sh 118 > ./mhap.000118.out 2>&1
    ./mhap.sh 119 > ./mhap.000119.out 2>&1
    ./mhap.sh 120 > ./mhap.000120.out 2>&1
    ./mhap.sh 121 > ./mhap.000121.out 2>&1
    ./mhap.sh 122 > ./mhap.000122.out 2>&1
    ./mhap.sh 123 > ./mhap.000123.out 2>&1
    ./mhap.sh 124 > ./mhap.000124.out 2>&1
    ./mhap.sh 125 > ./mhap.000125.out 2>&1
    ./mhap.sh 126 > ./mhap.000126.out 2>&1
    ./mhap.sh 127 > ./mhap.000127.out 2>&1
    ./mhap.sh 128 > ./mhap.000128.out 2>&1
    ./mhap.sh 129 > ./mhap.000129.out 2>&1
    ./mhap.sh 130 > ./mhap.000130.out 2>&1
    ./mhap.sh 131 > ./mhap.000131.out 2>&1
    ./mhap.sh 132 > ./mhap.000132.out 2>&1
    ./mhap.sh 133 > ./mhap.000133.out 2>&1
    ./mhap.sh 134 > ./mhap.000134.out 2>&1
    ./mhap.sh 135 > ./mhap.000135.out 2>&1
    ./mhap.sh 136 > ./mhap.000136.out 2>&1
    ./mhap.sh 137 > ./mhap.000137.out 2>&1
    ./mhap.sh 138 > ./mhap.000138.out 2>&1
    ./mhap.sh 139 > ./mhap.000139.out 2>&1
    ./mhap.sh 140 > ./mhap.000140.out 2>&1
    ./mhap.sh 141 > ./mhap.000141.out 2>&1
    ./mhap.sh 142 > ./mhap.000142.out 2>&1
    ./mhap.sh 143 > ./mhap.000143.out 2>&1
    ./mhap.sh 144 > ./mhap.000144.out 2>&1
    ./mhap.sh 145 > ./mhap.000145.out 2>&1
    ./mhap.sh 146 > ./mhap.000146.out 2>&1
    ./mhap.sh 147 > ./mhap.000147.out 2>&1

-- Finished on Mon Jul  3 14:20:41 2017 (16372 seconds) with 667.115 GB free disk space
----------------------------------------
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting concurrent execution on Mon Jul  3 14:20:42 2017 with 667.115 GB free disk space (67 processes; 3 concurrently)

    cd correction/1-overlapper
    ./mhap.sh 1 > ./mhap.000001.out 2>&1
    ./mhap.sh 3 > ./mhap.000003.out 2>&1
    ./mhap.sh 5 > ./mhap.000005.out 2>&1
    ./mhap.sh 6 > ./mhap.000006.out 2>&1
    ./mhap.sh 7 > ./mhap.000007.out 2>&1
    ./mhap.sh 9 > ./mhap.000009.out 2>&1
    ./mhap.sh 10 > ./mhap.000010.out 2>&1
    ./mhap.sh 11 > ./mhap.000011.out 2>&1
    ./mhap.sh 13 > ./mhap.000013.out 2>&1
    ./mhap.sh 14 > ./mhap.000014.out 2>&1
    ./mhap.sh 15 > ./mhap.000015.out 2>&1
    ./mhap.sh 16 > ./mhap.000016.out 2>&1
    ./mhap.sh 17 > ./mhap.000017.out 2>&1
    ./mhap.sh 18 > ./mhap.000018.out 2>&1
    ./mhap.sh 21 > ./mhap.000021.out 2>&1
    ./mhap.sh 22 > ./mhap.000022.out 2>&1
    ./mhap.sh 25 > ./mhap.000025.out 2>&1
    ./mhap.sh 26 > ./mhap.000026.out 2>&1
    ./mhap.sh 29 > ./mhap.000029.out 2>&1
    ./mhap.sh 30 > ./mhap.000030.out 2>&1
    ./mhap.sh 33 > ./mhap.000033.out 2>&1
    ./mhap.sh 34 > ./mhap.000034.out 2>&1
    ./mhap.sh 42 > ./mhap.000042.out 2>&1
    ./mhap.sh 46 > ./mhap.000046.out 2>&1
    ./mhap.sh 50 > ./mhap.000050.out 2>&1
    ./mhap.sh 54 > ./mhap.000054.out 2>&1
    ./mhap.sh 58 > ./mhap.000058.out 2>&1
    ./mhap.sh 61 > ./mhap.000061.out 2>&1
    ./mhap.sh 63 > ./mhap.000063.out 2>&1
    ./mhap.sh 64 > ./mhap.000064.out 2>&1
    ./mhap.sh 66 > ./mhap.000066.out 2>&1
    ./mhap.sh 67 > ./mhap.000067.out 2>&1
    ./mhap.sh 69 > ./mhap.000069.out 2>&1
    ./mhap.sh 70 > ./mhap.000070.out 2>&1
    ./mhap.sh 72 > ./mhap.000072.out 2>&1
    ./mhap.sh 75 > ./mhap.000075.out 2>&1
    ./mhap.sh 78 > ./mhap.000078.out 2>&1
    ./mhap.sh 81 > ./mhap.000081.out 2>&1
    ./mhap.sh 84 > ./mhap.000084.out 2>&1
    ./mhap.sh 87 > ./mhap.000087.out 2>&1
    ./mhap.sh 90 > ./mhap.000090.out 2>&1
    ./mhap.sh 91 > ./mhap.000091.out 2>&1
    ./mhap.sh 93 > ./mhap.000093.out 2>&1
    ./mhap.sh 96 > ./mhap.000096.out 2>&1
    ./mhap.sh 99 > ./mhap.000099.out 2>&1
    ./mhap.sh 101 > ./mhap.000101.out 2>&1
    ./mhap.sh 102 > ./mhap.000102.out 2>&1
    ./mhap.sh 106 > ./mhap.000106.out 2>&1
    ./mhap.sh 108 > ./mhap.000108.out 2>&1
    ./mhap.sh 109 > ./mhap.000109.out 2>&1
    ./mhap.sh 110 > ./mhap.000110.out 2>&1
    ./mhap.sh 111 > ./mhap.000111.out 2>&1
    ./mhap.sh 113 > ./mhap.000113.out 2>&1
    ./mhap.sh 114 > ./mhap.000114.out 2>&1
    ./mhap.sh 124 > ./mhap.000124.out 2>&1
    ./mhap.sh 125 > ./mhap.000125.out 2>&1
    ./mhap.sh 133 > ./mhap.000133.out 2>&1
    ./mhap.sh 134 > ./mhap.000134.out 2>&1
    ./mhap.sh 138 > ./mhap.000138.out 2>&1
    ./mhap.sh 140 > ./mhap.000140.out 2>&1
    ./mhap.sh 141 > ./mhap.000141.out 2>&1
    ./mhap.sh 142 > ./mhap.000142.out 2>&1
    ./mhap.sh 143 > ./mhap.000143.out 2>&1
    ./mhap.sh 144 > ./mhap.000144.out 2>&1
    ./mhap.sh 145 > ./mhap.000145.out 2>&1
    ./mhap.sh 146 > ./mhap.000146.out 2>&1
    ./mhap.sh 147 > ./mhap.000147.out 2>&1

-- Finished on Mon Jul  3 16:08:50 2017 (6488 seconds) with 570.73 GB free disk space
----------------------------------------
--
-- 53 mhap jobs failed:
--   job 1-overlapper/results/000001.ovb FAILED.
--   job 1-overlapper/results/000003.ovb FAILED.
--   job 1-overlapper/results/000005.ovb FAILED.
--   job 1-overlapper/results/000006.ovb FAILED.
--   job 1-overlapper/results/000007.ovb FAILED.
--   job 1-overlapper/results/000009.ovb FAILED.
--   job 1-overlapper/results/000010.ovb FAILED.
--   job 1-overlapper/results/000011.ovb FAILED.
--   job 1-overlapper/results/000013.ovb FAILED.
--   job 1-overlapper/results/000014.ovb FAILED.
--   job 1-overlapper/results/000015.ovb FAILED.
--   job 1-overlapper/results/000017.ovb FAILED.
--   job 1-overlapper/results/000018.ovb FAILED.
--   job 1-overlapper/results/000021.ovb FAILED.
--   job 1-overlapper/results/000022.ovb FAILED.
--   job 1-overlapper/results/000025.ovb FAILED.
--   job 1-overlapper/results/000026.ovb FAILED.
--   job 1-overlapper/results/000029.ovb FAILED.
--   job 1-overlapper/results/000030.ovb FAILED.
--   job 1-overlapper/results/000033.ovb FAILED.
--   job 1-overlapper/results/000034.ovb FAILED.
--   job 1-overlapper/results/000042.ovb FAILED.
--   job 1-overlapper/results/000046.ovb FAILED.
--   job 1-overlapper/results/000050.ovb FAILED.
--   job 1-overlapper/results/000054.ovb FAILED.
--   job 1-overlapper/results/000058.ovb FAILED.
--   job 1-overlapper/results/000061.ovb FAILED.
--   job 1-overlapper/results/000063.ovb FAILED.
--   job 1-overlapper/results/000064.ovb FAILED.
--   job 1-overlapper/results/000066.ovb FAILED.
--   job 1-overlapper/results/000067.ovb FAILED.
--   job 1-overlapper/results/000069.ovb FAILED.
--   job 1-overlapper/results/000070.ovb FAILED.
--   job 1-overlapper/results/000072.ovb FAILED.
--   job 1-overlapper/results/000075.ovb FAILED.
--   job 1-overlapper/results/000078.ovb FAILED.
--   job 1-overlapper/results/000081.ovb FAILED.
--   job 1-overlapper/results/000084.ovb FAILED.
--   job 1-overlapper/results/000087.ovb FAILED.
--   job 1-overlapper/results/000090.ovb FAILED.
--   job 1-overlapper/results/000093.ovb FAILED.
--   job 1-overlapper/results/000096.ovb FAILED.
--   job 1-overlapper/results/000099.ovb FAILED.
--   job 1-overlapper/results/000102.ovb FAILED.
--   job 1-overlapper/results/000106.ovb FAILED.
--   job 1-overlapper/results/000108.ovb FAILED.
--   job 1-overlapper/results/000109.ovb FAILED.
--   job 1-overlapper/results/000110.ovb FAILED.
--   job 1-overlapper/results/000125.ovb FAILED.
--   job 1-overlapper/results/000142.ovb FAILED.
--   job 1-overlapper/results/000145.ovb FAILED.
--   job 1-overlapper/results/000146.ovb FAILED.
--   job 1-overlapper/results/000147.ovb FAILED.
--
================================================================================
Don't panic, but a mostly harmless error occurred and Canu stopped.

Canu release v1.5 failed with:
  canu iteration count too high, stopping pipeline (most likely a problem in the grid-based computes)
FadyMohareb commented 7 years ago

here is the output from mhap.00110.out (one of the failed jobs)

Running job 110 based on command line options.
Fetch blocks/000035.dat
Fetch blocks/000036.dat
Fetch blocks/000037.dat
Fetch blocks/000038.dat
Fetch blocks/000039.dat
Fetch blocks/000040.dat
Fetch blocks/000041.dat
Fetch blocks/000042.dat
Fetch blocks/000043.dat
Fetch blocks/000044.dat
Fetch blocks/000045.dat
Fetch blocks/000046.dat
Fetch blocks/000047.dat
Fetch blocks/000048.dat
Fetch blocks/000049.dat

Running block 000034 in query 000110

INVALID OVERLAP  3264075 (len  15608)  3245326 (len   7377) hangs  16244     99 -      1   6433 flip 1
skoren commented 7 years ago

This looks like a possibly corrupted file on disk or a failure on the previous step that was not detected by Canu. This partition is dependent on several of the re-run jobs of precompute.sh. Are there any errors in the precompute.*.out files? Maybe you can upload them as a tar.gz here.

brianwalenz commented 7 years ago

No response.