marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
658 stars 179 forks source link

Meryl failed on the E coli test set #710

Closed biophage closed 6 years ago

biophage commented 6 years ago

Hi,

I'm a new canu user trying to assemble pacbio reads from bacterial genomes. I followed step-by-step the Installation and Quick Start instructions on the E. coli set but canu stops at some point in the correction phase. I read up about similar issues in the forum but none of the suggestions worked in my case. Based on the information I read is useful to solve the problem, here's what I did:

  1. I installed canu in a Linux server which I access by ssh. I installed canu from both the binary distribution (canu-1.6.Linux-amd64.tar.xz) and the source code but I got the same result. Yet, it seems canu was successuflly installed:

    Success!
    canu installed in /pub37/acazeres/canu-1.6/Linux-amd64/bin/canu
  2. I ran the Quick Start test: canu-1.6/Linux-amd64/bin/canu -p ecoli -d ecoli-pacbio genomeSize=4.8m -pacbio-raw pacbio.fastq

  3. This is what I got instantly:

    
    -- CONFIGURE CANU
    --
    -- Detected Java(TM) Runtime Environment '1.8.0_121' (from 'java').
    -- Detected gnuplot version '4.4 patchlevel 0' (from 'gnuplot') and image format 'png'.
    -- Detected 8 CPUs and 47 gigabytes of memory.
    -- Detected Slurm with 'sinfo' binary in /usr/local/bin/sinfo.
    -- Detected Slurm with 'MaxArraySize' limited to 1000 jobs.
    -- 
    -- Found  31 hosts with   8 cores and   39 GB memory under Slurm control.
    --
    --                     (tag)Threads
    --            (tag)Memory         |
    --        (tag)         |         |  algorithm
    --        -------  ------  --------  -----------------------------
    -- Grid:  meryl      8 GB    4 CPUs  (k-mer counting)
    -- Grid:  cormhap    6 GB    8 CPUs  (overlap detection with mhap)
    -- Grid:  obtovl     8 GB    8 CPUs  (overlap detection)
    -- Grid:  utgovl     8 GB    8 CPUs  (overlap detection)
    -- Grid:  cor        9 GB    2 CPUs  (read correction)
    -- Grid:  ovb        4 GB    1 CPU   (overlap store bucketizer)
    -- Grid:  ovs        8 GB    1 CPU   (overlap store sorting)
    -- Grid:  red        2 GB    4 CPUs  (read error detection)
    -- Grid:  oea        1 GB    1 CPU   (overlap error adjustment)
    -- Grid:  bat       16 GB    4 CPUs  (contig construction)
    -- Grid:  cns       19 GB    4 CPUs  (consensus)
    -- Grid:  gfa        8 GB    4 CPUs  (GFA alignment and processing)
    --
    -- Found PacBio uncorrected reads in 'correction/ecoli.gkpStore'.
    --
    -- Generating assembly 'ecoli' in '/pub37/acazeres/canu-1.6/ecoli-pacbio'
    --
    -- Parameters:
    --
    --  genomeSize        4800000
    --
    --  Overlap Generation Limits:
    --    corOvlErrorRate 0.2400 ( 24.00%)
    --    obtOvlErrorRate 0.0450 (  4.50%)
    --    utgOvlErrorRate 0.0450 (  4.50%)
    --
    --  Overlap Processing Limits:
    --    corErrorRate    0.3000 ( 30.00%)
    --    obtErrorRate    0.0450 (  4.50%)
    --    utgErrorRate    0.0450 (  4.50%)
    --    cnsErrorRate    0.0750 (  7.50%)
    --
    --
    -- BEGIN CORRECTION
    --
    --
    -- Meryl failed, tried 2 times, giving up.
    --

ABORT: ABORT: Canu 1.6 ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped. ABORT: Try restarting. If that doesn't work, ask for help. ABORT:


4. This is the content of the `ecoli.report` file within the `ecoli-pacbio` directory:

-- In gatekeeper store 'correction/ecoli.gkpStore': -- Found 12528 reads. -- Found 115899341 bases (24.14 times coverage).

-- Read length histogram (one '*' equals 20.62 reads): -- 0 999 0 -- 1000 1999 1444 ** -- 2000 2999 1328 **** -- 3000 3999 1065 *** -- 4000 4999 774 * -- 5000 5999 668 **** -- 6000 6999 619 ** -- 7000 7999 618 -- 8000 8999 607 -- 9000 9999 560 ** -- 10000 10999 523 -- 11000 11999 478 *** -- 12000 12999 429 **** -- 13000 13999 379 ** -- 14000 14999 366 * -- 15000 15999 353 ** -- 16000 16999 329 -- 17000 17999 297 ** -- 18000 18999 294 ** -- 19000 19999 283 * -- 20000 20999 251 **** -- 21000 21999 195 * -- 22000 22999 152 * -- 23000 23999 132 ** -- 24000 24999 75 * -- 25000 25999 66 * -- 26000 26999 56 -- 27000 27999 44 * -- 28000 28999 35 -- 29000 29999 16 -- 30000 30999 21 * -- 31000 31999 18 -- 32000 32999 11 -- 33000 33999 8 -- 34000 34999 6 -- 35000 35999 6 -- 36000 36999 10 -- 37000 37999 2 -- 38000 38999 3 -- 39000 39999 2 -- 40000 40999 2 -- 41000 41999 2 -- 42000 42999 1


5. And this is the content of the `meryl.*.out` file within the `0-mercounts` directory. Actually two files are generated by they have the same content. The same happens when I run the `meryl.sh` script.
`cat ecoli-pacbio/correction/0-mercounts/meryl.483_1.out`

Computing 8 segments using 4 threads and 176MB memory (151MB if in one batch). numMersActual = 115711422 mersPerBatch = 28927855 basesPerBatch = 14487418 numBuckets = 1048576 (20 bits) bucketPointerWidth = 24 merDataWidth = 12 Computing segment 7 of 8. Allocating 3MB for bucket pointer table (24 bits wide). Allocating 4MB for counting the size of each bucket. Computing segment 1 of 8. Allocating 3MB for bucket pointer table (24 bits wide). Allocating 4MB for counting the size of each bucket. Computing segment 5 of 8. Allocating 3MB for bucket pointer table (24 bits wide). Computing segment 3 of 8. Allocating 3MB for bucket pointer table (24 bits wide). Allocating 4MB for counting the size of each bucket. Allocating 4MB for counting the size of each bucket. /var/spool/slurmd.spool/job00483/slurm_script: line 34: 13898 Segmentation fault $bin/meryl -B -C -L 2 -v -m 16 -threads 4 -memory 6553 -s ../ecoli.gkpStore -o ./ecoli.ms16.WORKING /var/spool/slurmd.spool/job00483/slurm_script: line 63: 13908 Segmentation fault $bin/estimate-mer-threshold -h ./ecoli.ms16.histogram -c 24 > ./ecoli.ms16.estMerThresh.out.WORKING 2> ./ecoli.ms16.estMerThresh.err



6. I suppose the clues to solve the problem are in the last lines of the file but I'm too new in this field so I haven't been able to figure out what is happening. Hopefully the issue is easy to solve but I'd appreciate your help so much.

Very best,

P.S. I got the same result by using one of my datasets.
skoren commented 6 years ago

My guess would be a difference between the machine you're building Canu on and the machine it is running on the grid. Try running with useGrid=0 and see what happens.

biophage commented 6 years ago

Hey, thank you very much for the reply. I'm still struggling with this issue. I ran with useGrid=0: canu-1.6/Linux-amd64/bin/canu -p ecoli -d ecoli-pacbio genomeSize=4.8m -pacbio-raw pacbio.fastq useGrid=0

I'm still getting the same result with subtle differences:

-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_121' (from 'java').
-- Detected gnuplot version '4.4 patchlevel 0' (from 'gnuplot') and image format 'png'.
-- Detected 8 CPUs and 39 gigabytes of memory.
-- Detected Slurm with 'sinfo' binary in /usr/local/bin/sinfo.
-- Grid engine disabled per useGrid=false option.
--
--                            (tag)Concurrency
--                     (tag)Threads          |
--            (tag)Memory         |          |
--        (tag)         |         |          |  algorithm
--        -------  ------  --------   --------  -----------------------------
-- Local: meryl      8 GB    4 CPUs x   2 jobs  (k-mer counting)
-- Local: cormhap    6 GB    8 CPUs x   1 job   (overlap detection with mhap)
-- Local: obtovl     8 GB    8 CPUs x   1 job   (overlap detection)
-- Local: utgovl     8 GB    8 CPUs x   1 job   (overlap detection)
-- Local: cor        9 GB    2 CPUs x   4 jobs  (read correction)
-- Local: ovb        4 GB    1 CPU  x   8 jobs  (overlap store bucketizer)
-- Local: ovs        8 GB    1 CPU  x   8 jobs  (overlap store sorting)
-- Local: red        2 GB    4 CPUs x   2 jobs  (read error detection)
-- Local: oea        1 GB    1 CPU  x   8 jobs  (overlap error adjustment)
-- Local: bat       16 GB    4 CPUs x   2 jobs  (contig construction)
-- Local: cns       19 GB    4 CPUs x   2 jobs  (consensus)
-- Local: gfa        8 GB    4 CPUs x   2 jobs  (GFA alignment and processing)
--
-- Found PacBio uncorrected reads in the input files.
--
-- Generating assembly 'ecoli' in '/pub37/acazeres/Assembly/Bacteria/PacBio/Ecoli_test/ecoli-pacbio'
--
-- Parameters:
--
--  genomeSize        4800000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.0450 (  4.50%)
--    utgOvlErrorRate 0.0450 (  4.50%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.0450 (  4.50%)
--    utgErrorRate    0.0450 (  4.50%)
--    cnsErrorRate    0.0750 (  7.50%)
--
--
-- BEGIN CORRECTION
--
----------------------------------------
-- Starting command on Sun Nov 26 12:23:43 2017 with 59581.474 GB free disk space

    cd correction
    /pub37/acazeres/canu-1.6/Linux-amd64/bin/gatekeeperCreate \
      -minlength 1000 \
      -o ./ecoli.gkpStore.BUILDING \
      ./ecoli.gkpStore.gkp \
    > ./ecoli.gkpStore.BUILDING.err 2>&1

-- Finished on Sun Nov 26 12:23:46 2017 (3 seconds) with 59581.474 GB free disk space
----------------------------------------
--
-- In gatekeeper store 'correction/ecoli.gkpStore':
--   Found 12528 reads.
--   Found 115899341 bases (24.14 times coverage).
--
--   Read length histogram (one '*' equals 20.62 reads):
--        0    999      0 
--     1000   1999   1444 **********************************************************************
--     2000   2999   1328 ****************************************************************
--     3000   3999   1065 ***************************************************
--     4000   4999    774 *************************************
--     5000   5999    668 ********************************
--     6000   6999    619 ******************************
--     7000   7999    618 *****************************
--     8000   8999    607 *****************************
--     9000   9999    560 ***************************
--    10000  10999    523 *************************
--    11000  11999    478 ***********************
--    12000  12999    429 ********************
--    13000  13999    379 ******************
--    14000  14999    366 *****************
--    15000  15999    353 *****************
--    16000  16999    329 ***************
--    17000  17999    297 **************
--    18000  18999    294 **************
--    19000  19999    283 *************
--    20000  20999    251 ************
--    21000  21999    195 *********
--    22000  22999    152 *******
--    23000  23999    132 ******
--    24000  24999     75 ***
--    25000  25999     66 ***
--    26000  26999     56 **
--    27000  27999     44 **
--    28000  28999     35 *
--    29000  29999     16 
--    30000  30999     21 *
--    31000  31999     18 
--    32000  32999     11 
--    33000  33999      8 
--    34000  34999      6 
--    35000  35999      6 
--    36000  36999     10 
--    37000  37999      2 
--    38000  38999      3 
--    39000  39999      2 
--    40000  40999      2 
--    41000  41999      2 
--    42000  42999      1 
-- Finished stage 'cor-gatekeeper', reset canuIteration.
-- Finished stage 'merylConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'meryl' concurrent execution on Sun Nov 26 12:23:46 2017 with 59581.445 GB free disk space (1 processes; 2 concurrently)

    cd correction/0-mercounts
    ./meryl.sh 1 > ./meryl.000001.out 2>&1

-- Finished on Sun Nov 26 12:23:47 2017 (1 second) with 59581.445 GB free disk space
----------------------------------------
--
-- Meryl failed, retry.
--
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting 'meryl' concurrent execution on Sun Nov 26 12:23:47 2017 with 59581.445 GB free disk space (1 processes; 2 concurrently)

    cd correction/0-mercounts
    ./meryl.sh 1 > ./meryl.000001.out 2>&1

-- Finished on Sun Nov 26 12:23:47 2017 (lickety-split) with 59581.445 GB free disk space
----------------------------------------
--
-- Meryl failed, tried 2 times, giving up.
--

ABORT:
ABORT: Canu 1.6
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

This time the canu.out file wasn't produced. meryl.*.out file has the same content as that I got without the useGrid=0 option. Just one meryl.*.out file (instead two) was generated this time though. These are the differences I could find.

skoren commented 6 years ago

This is strange and seems to be a bug in meryl but it is not occurring on our systems. Try building the release from source by running make clean && make BUILDDEBUG and see if that runs.

biophage commented 6 years ago

I'm sorry for the late reply, I had an extra busy week. I agree it is weird, I installed canu in my laptop and is working then I got access to a different server and canu is running as expected too so it is "something" about meryl in this server. I'll try your suggestion and see what happens!

brianwalenz commented 6 years ago

Any update? Reopen if so.

biophage commented 6 years ago

Hi,

unfortunately, after building the release from source, I'm still getting the same error in meryl. I'm giving up on running canu on this server, I'll use another cluster where the installation was successful or my laptop.

Best,