Meryl failed on the E coli test set

marbl / canu

A single molecule sequence assembler for genomes large and small.

658 stars 179 forks source link

Hi,

I'm a new canu user trying to assemble pacbio reads from bacterial genomes. I followed step-by-step the Installation and Quick Start instructions on the E. coli set but canu stops at some point in the correction phase. I read up about similar issues in the forum but none of the suggestions worked in my case. Based on the information I read is useful to solve the problem, here's what I did:

I installed canu in a Linux server which I access by ssh. I installed canu from both the binary distribution (canu-1.6.Linux-amd64.tar.xz) and the source code but I got the same result. Yet, it seems canu was successuflly installed:
```
Success!
canu installed in /pub37/acazeres/canu-1.6/Linux-amd64/bin/canu
```
I ran the Quick Start test: canu-1.6/Linux-amd64/bin/canu -p ecoli -d ecoli-pacbio genomeSize=4.8m -pacbio-raw pacbio.fastq

This is what I got instantly:


-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_121' (from 'java').
-- Detected gnuplot version '4.4 patchlevel 0' (from 'gnuplot') and image format 'png'.
-- Detected 8 CPUs and 47 gigabytes of memory.
-- Detected Slurm with 'sinfo' binary in /usr/local/bin/sinfo.
-- Detected Slurm with 'MaxArraySize' limited to 1000 jobs.
-- 
-- Found  31 hosts with   8 cores and   39 GB memory under Slurm control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl      8 GB    4 CPUs  (k-mer counting)
-- Grid:  cormhap    6 GB    8 CPUs  (overlap detection with mhap)
-- Grid:  obtovl     8 GB    8 CPUs  (overlap detection)
-- Grid:  utgovl     8 GB    8 CPUs  (overlap detection)
-- Grid:  cor        9 GB    2 CPUs  (read correction)
-- Grid:  ovb        4 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8 GB    1 CPU   (overlap store sorting)
-- Grid:  red        2 GB    4 CPUs  (read error detection)
-- Grid:  oea        1 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       16 GB    4 CPUs  (contig construction)
-- Grid:  cns       19 GB    4 CPUs  (consensus)
-- Grid:  gfa        8 GB    4 CPUs  (GFA alignment and processing)
--
-- Found PacBio uncorrected reads in 'correction/ecoli.gkpStore'.
--
-- Generating assembly 'ecoli' in '/pub37/acazeres/canu-1.6/ecoli-pacbio'
--
-- Parameters:
--
--  genomeSize        4800000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.0450 (  4.50%)
--    utgOvlErrorRate 0.0450 (  4.50%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.0450 (  4.50%)
--    utgErrorRate    0.0450 (  4.50%)
--    cnsErrorRate    0.0750 (  7.50%)
--
--
-- BEGIN CORRECTION
--
--
-- Meryl failed, tried 2 times, giving up.
--

ABORT: ABORT: Canu 1.6 ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped. ABORT: Try restarting. If that doesn't work, ask for help. ABORT:


4. This is the content of the `ecoli.report` file within the `ecoli-pacbio` directory:

-- In gatekeeper store 'correction/ecoli.gkpStore': -- Found 12528 reads. -- Found 115899341 bases (24.14 times coverage).

-- Read -- 0 999 0 -- 1000 -- 2000 -- 3000 -- 4000 4999 -- 5000 5999 -- 6000 6999 -- 7000 7999 -- 8000 8999 -- 9000 9999 -- 10000 10999 -- 11000 11999 -- 12000 12999 -- 13000 13999 -- 14000 14999 -- 15000 15999 -- 16000 16999 -- 17000 17999 -- 18000 18999 -- 19000 19999 -- 20000 20999 -- 21000 21999 -- 22000 22999 -- 23000 23999 -- 24000 24999 75 * -- 25000 25999 66 * -- 26000 26999 56 -- 27000 27999 44 * -- 28000 28999 35 -- 29000 29999 16 -- 30000 30999 21 * -- 31000 31999 18 -- 32000 32999 11 -- 33000 33999 8 -- 34000 34999 6 -- 35000 35999 6 -- 36000 36999 10 -- 37000 37999 2 -- 38000 38999 3 -- 39000 39999 2 -- 40000 40999 2 -- 41000 41999 2 -- 42000 42999 1 length histogram (one '*' equals 20.62 reads): 1999 1444 ** 2999 1328 **** 3999 1065 *** 774 * 668 **** 619 ** 618 607 560 ** 523 478 *** 429 **** 379 ** 366 * 353 ** 329 297 ** 294 ** 283 * 251 **** 195 * 152 * 132 **

5. And this is the content of the `meryl.*.out` file within the `0-mercounts` directory. Actually two files are generated by they have the same content. The same happens when I run the `meryl.sh` script. `cat ecoli-pacbio/correction/0-mercounts/meryl.483_1.out`

Computing 8 segments using 4 threads and 176MB memory (151MB if in one batch). numMersActual = 115711422 mersPerBatch = 28927855 basesPerBatch = 14487418 numBuckets = 1048576 (20 bits) bucketPointerWidth = 24 merDataWidth = 12 Computing segment 7 of 8. Allocating 3MB for bucket pointer table (24 bits wide). Allocating 4MB for counting the size of each bucket. Computing segment 1 of 8. Allocating 3MB for bucket pointer table (24 bits wide). Allocating 4MB for counting the size of each bucket. Computing segment 5 of 8. Allocating 3MB for bucket pointer table (24 bits wide). Computing segment 3 of 8. Allocating 3MB for bucket pointer table (24 bits wide). Allocating 4MB for counting the size of each bucket. Allocating 4MB for counting the size of each bucket. /var/spool/slurmd.spool/job00483/slurm_script: line 34: 13898 Segmentation fault $bin/meryl -B -C -L 2 -v -m 16 -threads 4 -memory 6553 -s ../ecoli.gkpStore -o ./ecoli.ms16.WORKING /var/spool/slurmd.spool/job00483/slurm_script: line 63: 13908 Segmentation fault $bin/estimate-mer-threshold -h ./ecoli.ms16.histogram -c 24 > ./ecoli.ms16.estMerThresh.out.WORKING 2> ./ecoli.ms16.estMerThresh.err

6. I suppose the clues to solve the problem are in the last lines of the file but I'm too new in this field so I haven't been able to figure out what is happening. Hopefully the issue is easy to solve but I'd appreciate your help so much. Very best, P.S. I got the same result by using one of my datasets.

-- CONFIGURE CANU -- -- Detected Java(TM) Runtime Environment '1.8.0_121' (from 'java'). -- Detected gnuplot version '4.4 patchlevel 0' (from 'gnuplot') and image format 'png'. -- Detected 8 CPUs and 39 gigabytes of memory. -- Detected Slurm with 'sinfo' binary in /usr/local/bin/sinfo. -- Grid engine disabled per useGrid=false option. -- -- (tag)Concurrency -- (tag)Threads | -- (tag)Memory | | -- (tag) | | | algorithm -- ------- ------ -------- -------- ----------------------------- -- Local: meryl 8 GB 4 CPUs x 2 jobs (k-mer counting) -- Local: cormhap 6 GB 8 CPUs x 1 job (overlap detection with mhap) -- Local: obtovl 8 GB 8 CPUs x 1 job (overlap detection) -- Local: utgovl 8 GB 8 CPUs x 1 job (overlap detection) -- Local: cor 9 GB 2 CPUs x 4 jobs (read correction) -- Local: ovb 4 GB 1 CPU x 8 jobs (overlap store bucketizer) -- Local: ovs 8 GB 1 CPU x 8 jobs (overlap store sorting) -- Local: red 2 GB 4 CPUs x 2 jobs (read error detection) -- Local: oea 1 GB 1 CPU x 8 jobs (overlap error adjustment) -- Local: bat 16 GB 4 CPUs x 2 jobs (contig construction) -- Local: cns 19 GB 4 CPUs x 2 jobs (consensus) -- Local: gfa 8 GB 4 CPUs x 2 jobs (GFA alignment and processing) -- -- Found PacBio uncorrected reads in the input files. -- -- Generating assembly 'ecoli' in '/pub37/acazeres/Assembly/Bacteria/PacBio/Ecoli_test/ecoli-pacbio' -- -- Parameters: -- -- genomeSize 4800000 -- -- Overlap Generation Limits: -- corOvlErrorRate 0.2400 ( 24.00%) -- obtOvlErrorRate 0.0450 ( 4.50%) -- utgOvlErrorRate 0.0450 ( 4.50%) -- -- Overlap Processing Limits: -- corErrorRate 0.3000 ( 30.00%) -- obtErrorRate 0.0450 ( 4.50%) -- utgErrorRate 0.0450 ( 4.50%) -- cnsErrorRate 0.0750 ( 7.50%) -- -- -- BEGIN CORRECTION -- ---------------------------------------- -- Starting command on Sun Nov 26 12:23:43 2017 with 59581.474 GB free disk space cd correction /pub37/acazeres/canu-1.6/Linux-amd64/bin/gatekeeperCreate \ -minlength 1000 \ -o ./ecoli.gkpStore.BUILDING \ ./ecoli.gkpStore.gkp \ > ./ecoli.gkpStore.BUILDING.err 2>&1 -- Finished on Sun Nov 26 12:23:46 2017 (3 seconds) with 59581.474 GB free disk space ---------------------------------------- -- -- In gatekeeper store 'correction/ecoli.gkpStore': -- Found 12528 reads. -- Found 115899341 bases (24.14 times coverage). -- -- Read length histogram (one '*' equals 20.62 reads): -- 0 999 0 -- 1000 1999 1444 ********************************************************************** -- 2000 2999 1328 **************************************************************** -- 3000 3999 1065 *************************************************** -- 4000 4999 774 ************************************* -- 5000 5999 668 ******************************** -- 6000 6999 619 ****************************** -- 7000 7999 618 ***************************** -- 8000 8999 607 ***************************** -- 9000 9999 560 *************************** -- 10000 10999 523 ************************* -- 11000 11999 478 *********************** -- 12000 12999 429 ******************** -- 13000 13999 379 ****************** -- 14000 14999 366 ***************** -- 15000 15999 353 ***************** -- 16000 16999 329 *************** -- 17000 17999 297 ************** -- 18000 18999 294 ************** -- 19000 19999 283 ************* -- 20000 20999 251 ************ -- 21000 21999 195 ********* -- 22000 22999 152 ******* -- 23000 23999 132 ****** -- 24000 24999 75 *** -- 25000 25999 66 *** -- 26000 26999 56 ** -- 27000 27999 44 ** -- 28000 28999 35 * -- 29000 29999 16 -- 30000 30999 21 * -- 31000 31999 18 -- 32000 32999 11 -- 33000 33999 8 -- 34000 34999 6 -- 35000 35999 6 -- 36000 36999 10 -- 37000 37999 2 -- 38000 38999 3 -- 39000 39999 2 -- 40000 40999 2 -- 41000 41999 2 -- 42000 42999 1 -- Finished stage 'cor-gatekeeper', reset canuIteration. -- Finished stage 'merylConfigure', reset canuIteration. -- -- Running jobs. First attempt out of 2. ---------------------------------------- -- Starting 'meryl' concurrent execution on Sun Nov 26 12:23:46 2017 with 59581.445 GB free disk space (1 processes; 2 concurrently) cd correction/0-mercounts ./meryl.sh 1 > ./meryl.000001.out 2>&1 -- Finished on Sun Nov 26 12:23:47 2017 (1 second) with 59581.445 GB free disk space ---------------------------------------- -- -- Meryl failed, retry. -- -- -- Running jobs. Second attempt out of 2. ---------------------------------------- -- Starting 'meryl' concurrent execution on Sun Nov 26 12:23:47 2017 with 59581.445 GB free disk space (1 processes; 2 concurrently) cd correction/0-mercounts ./meryl.sh 1 > ./meryl.000001.out 2>&1 -- Finished on Sun Nov 26 12:23:47 2017 (lickety-split) with 59581.445 GB free disk space ---------------------------------------- -- -- Meryl failed, tried 2 times, giving up. -- ABORT: ABORT: Canu 1.6 ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped. ABORT: Try restarting. If that doesn't work, ask for help. ABORT:

marbl / canu

Meryl failed on the E coli test set #710

-- In gatekeeper store 'correction/ecoli.gkpStore': -- Found 12528 reads. -- Found 115899341 bases (24.14 times coverage).