marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
653 stars 179 forks source link

Canu stop with Mhap precompute jobs failed #870

Closed tingyanchang closed 6 years ago

tingyanchang commented 6 years ago

hello

I ran the ecoli data from tutorials, it succeed. but I test assembly bird genome with 30x depth by canu v1.7. It tooks long time and fail.I have a 256G memory server and about 15T hard drive space.

Another question is that I have to assemble a genome about 1G. The genome DNA was sequenced by pacbio, and the depth is 40x. Is there any suggestion how to setting the canu parameter?

Here is my canu command canu -p asm -d Zfinch_genome genomeSize=1223m -pacbio-raw Zfinch_merged.x\=030.000.n\=003384909.u.fastq rawErrorRate=0.035 overlapper=mhap utgReAlign=true

my output

-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_121' (from '/home/rom1025/program/anaconda3/envs/canu/bin/java').
-- Detected gnuplot version '5.2 patchlevel 0' (from 'gnuplot') and image format 'svg'.
-- Detected 64 CPUs and 252 gigabytes of memory.
-- No grid engine detected, grid disabled.
--
--                            (tag)Concurrency
--                     (tag)Threads          |
--            (tag)Memory         |          |
--        (tag)         |         |          |     total usage     algorithm
--        -------  ------  --------   --------  -----------------  -----------------------------
-- Local: meryl    252 GB   32 CPUs x   1 job    252 GB   32 CPUs  (k-mer counting)
-- Local: cormhap   32 GB   16 CPUs x   4 jobs   128 GB   64 CPUs  (overlap detection with mhap)
-- Local: obtmhap   32 GB   16 CPUs x   4 jobs   128 GB   64 CPUs  (overlap detection with mhap)
-- Local: utgmhap   32 GB   16 CPUs x   4 jobs   128 GB   64 CPUs  (overlap detection with mhap)
-- Local: ovb        3 GB    1 CPU  x  64 jobs   192 GB   64 CPUs  (overlap store bucketizer)
-- Local: ovs       32 GB    1 CPU  x   7 jobs   224 GB    7 CPUs  (overlap store sorting)
-- Local: red        8 GB    4 CPUs x  16 jobs   128 GB   64 CPUs  (read error detection)
-- Local: oea        4 GB    1 CPU  x  63 jobs   252 GB   63 CPUs  (overlap error adjustment)
-- Local: bat      252 GB   16 CPUs x   1 job    252 GB   16 CPUs  (contig construction)
-- Local: gfa       16 GB   16 CPUs x   1 job     16 GB   16 CPUs  (GFA alignment and processing)
--
-- In 'asm.gkpStore', found PacBio reads:
--   Raw:        2352917
--   Corrected:  93
--   Trimmed:    0
--
-- Generating assembly 'asm' in '/mnt/Storage/lxc_Storage/labsystem/rom1025/canu_test/zebrafinch/cover30/Zfinch_genome'
--
-- Parameters:
--
--  genomeSize        1223000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.0350 (  3.50%)
--    obtOvlErrorRate 0.0450 (  4.50%)
--    utgOvlErrorRate 0.0450 (  4.50%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.0350 (  3.50%)
--    obtErrorRate    0.0450 (  4.50%)
--    utgErrorRate    0.0450 (  4.50%)
--    cnsErrorRate    0.0750 (  7.50%)
--
--
-- BEGIN TRIMMING
--
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'obtmhap' concurrent execution on Sun Apr 15 16:14:23 2018 with 15072.197 GB free disk space (2 processes; 4 concurrently)

    cd trimming/1-overlapper
    ./precompute.sh 1 > ./precompute.000001.out 2>&1
    ./precompute.sh 25 > ./precompute.000025.out 2>&1

-- Finished on Sun Apr 15 16:14:23 2018 (lickety-split) with 15072.197 GB free disk space
----------------------------------------
--
-- Mhap precompute jobs failed, retry.
--   job trimming/1-overlapper/blocks/000001.dat FAILED.
--   job trimming/1-overlapper/blocks/000025.dat FAILED.
--
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting 'obtmhap' concurrent execution on Sun Apr 15 16:14:23 2018 with 15072.197 GB free disk space (2 processes; 4 concurrently)

    cd trimming/1-overlapper
    ./precompute.sh 1 > ./precompute.000001.out 2>&1
    ./precompute.sh 25 > ./precompute.000025.out 2>&1

-- Finished on Sun Apr 15 16:14:23 2018 (lickety-split) with 15072.197 GB free disk space
----------------------------------------
--
-- Mhap precompute jobs failed, tried 2 times, giving up.
--   job trimming/1-overlapper/blocks/000001.dat FAILED.
--   job trimming/1-overlapper/blocks/000025.dat FAILED.
--

ABORT:
ABORT: Canu 1.7
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

this is precompute.000001.out content

Running job 1 based on command line options.
Dumping reads from 1 to 96000 (inclusive).
Failed to extract fasta.

Thanks for your help

skoren commented 6 years ago

I would guess the issue is you have no corrected data, the parameter rawErrorRate=0.035 sets the raw read error overlap to 3.5%. Since the raw reads individual error is over 10%, this eliminated most of the overlaps and led to no corrected reads. You also probably don't need overlapper=mhap utgReAlign=true, the fast option won't save that much time on PacBio data, it's primarily for useful for nanopore.

Since you only have about 30x coverage of raw data, you should probably use the sensitive parameters: 'correctedErrorRate=0.105' 'corMinCoverage=0' especially if your data is from a Sequel instrument which has lower quality than the RSII.

brianwalenz commented 6 years ago

The *.report will have histograms of the raw input and corrected read lengths (and gobs more stuff). This will confirm that few reads were corrected. Also, from the output you pasted:

-- In 'asm.gkpStore', found PacBio reads:
--   Raw:        2352917
--   Corrected:  93
--   Trimmed:    0

I encourage posting results of your 1Gbp assembly. We rarely hear of success stories here.