marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
646 stars 178 forks source link

Running canu for short amplicon data #875

Closed BOM86 closed 6 years ago

BOM86 commented 6 years ago

Hi,

I'm starting to use MinION sequencing to sequence ~450-550bp amplicons. This is done to set up a sensitive assay to get a complete viral genome in the end (with overlapping amplicons). I would like to test canu to perform error correction on these reads. However I experiences some issues while testing canu. The first problem is that I cannot set a genome size shorter than 1000bp, is it possible to ignore this error issue since my amplicons are shorter than 1000bp?

And a second question I have, is when I run canu using the following settings it works on my mac but it crashes on the server for some reason: canu -correct -minOverlapLength=10 -nanopore-raw BC01.qc.fq -p BC01_corrected gnuplotTested=true useGrid=false -genomeSize=1000 stopOnReadQuality=false -minReadLength=100

When I run this I get the following error (perhaps is it very easy to solve):

ABORT: Canu 1.7
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.

Can you perhaps help me with this issue?

Thanks in advance and best regards,

Bas

And in more detail the output of canu:

-- Canu 1.7
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
-- 
-- Read and contig alignments during correction, consensus and GFA building use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
--   Li H.
--   Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.
--   Bioinformatics. 2016 Jul 15;32(14):2103-10.
--   http://doi.org/10.1093/bioinformatics/btw152
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_161' (from 'java').
-- Detected 64 CPUs and 504 gigabytes of memory.
-- Detected Sun Grid Engine in '/opt/sge/default'.
-- Grid engine disabled per useGrid=false option.
--
--                            (tag)Concurrency
--                     (tag)Threads          |
--            (tag)Memory         |          |
--        (tag)         |         |          |     total usage     algorithm
--        -------  ------  --------   --------  -----------------  -----------------------------
-- Local: meryl      8 GB    4 CPUs x   1 job      8 GB    4 CPUs  (k-mer counting)
-- Local: cormhap    6 GB   16 CPUs x   4 jobs    24 GB   64 CPUs  (overlap detection with mhap)
-- Local: obtovl     4 GB    8 CPUs x   8 jobs    32 GB   64 CPUs  (overlap detection)
-- Local: utgovl     4 GB    8 CPUs x   8 jobs    32 GB   64 CPUs  (overlap detection)
-- Local: ovb        4 GB    1 CPU  x  64 jobs   256 GB   64 CPUs  (overlap store bucketizer)
-- Local: ovs        8 GB    1 CPU  x  63 jobs   504 GB   63 CPUs  (overlap store sorting)
-- Local: red        4 GB    4 CPUs x  16 jobs    64 GB   64 CPUs  (read error detection)
-- Local: oea        4 GB    1 CPU  x  64 jobs   256 GB   64 CPUs  (overlap error adjustment)
-- Local: bat       16 GB    4 CPUs x   1 job     16 GB    4 CPUs  (contig construction)
-- Local: gfa        8 GB    4 CPUs x   1 job      8 GB    4 CPUs  (GFA alignment and processing)
--
-- Found Nanopore uncorrected reads in the input files.
--
-- Generating assembly 'BC01_corrected' in '/home/basoudemunnink/GRIDIon_Viro_Run_6'
--
-- Parameters:
--
--  genomeSize        1000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1440 ( 14.40%)
--    utgOvlErrorRate 0.1440 ( 14.40%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1440 ( 14.40%)
--    utgErrorRate    0.1440 ( 14.40%)
--    cnsErrorRate    0.1920 ( 19.20%)
--
--
-- BEGIN CORRECTION
--
----------------------------------------
-- Starting command on Thu Apr 19 10:39:06 2018 with 72.469 GB free disk space

    cd .
    /home/basoudemunnink/Scripts/canu-1.7/Linux-amd64/bin/gatekeeperCreate \
      -minlength 100 \
      -o ./BC01_corrected.gkpStore.BUILDING \
      ./BC01_corrected.gkpStore.gkp \
    > ./BC01_corrected.gkpStore.BUILDING.err 2>&1

-- Finished on Thu Apr 19 10:39:07 2018 (1 second) with 72.397 GB free disk space
----------------------------------------
--
-- WARNING: gnuplot failed.
--
----------------------------------------
--
-- In gatekeeper store './BC01_corrected.gkpStore':
--   Found 145556 reads.
--   Found 73521916 bases (73521.91 times coverage).
--
--   Read length histogram (one '*' equals 926.04 reads):
--        0     99      0 
--      100    199    701 
--      200    299   4306 ****
--      300    399  14506 ***************
--      400    499  64823 **********************************************************************
--      500    599  47129 **************************************************
--      600    699   5916 ******
--      700    799   1522 *
--      800    899   1081 *
--      900    999   1296 *
--     1000   1099   1470 *
--     1100   1199   1563 *
--     1200   1299    535 
--     1300   1399    156 
--     1400   1499    138 
--     1500   1599    100 
--     1600   1699     97 
--     1700   1799     58 
--     1800   1899     36 
--     1900   1999     27 
--     2000   2099     16 
--     2100   2199     18 
--     2200   2299     13 
--     2300   2399      9 
--     2400   2499      9 
--     2500   2599      2 
--     2600   2699      3 
--     2700   2799      6 
--     2800   2899      2 
--     2900   2999      2 
--     3000   3099      2 
--     3100   3199      4 
--     3200   3299      0 
--     3300   3399      0 
--     3400   3499      0 
--     3500   3599      1 
--     3600   3699      1 
--     3700   3799      1 
--     3800   3899      0 
--     3900   3999      1 
--     4000   4099      0 
--     4100   4199      0 
--     4200   4299      0 
--     4300   4399      1 
--     4400   4499      0 
--     4500   4599      1 
--     4600   4699      0 
--     4700   4799      0 
--     4800   4899      1 
--     4900   4999      0 
--     5000   5099      0 
--     5100   5199      0 
--     5200   5299      0 
--     5300   5399      0 
--     5400   5499      0 
--     5500   5599      0 
--     5600   5699      0 
--     5700   5799      0 
--     5800   5899      0 
--     5900   5999      0 
--     6000   6099      0 
--     6100   6199      0 
--     6200   6299      0 
--     6300   6399      0 
--     6400   6499      1 
--     6500   6599      1 
--     6600   6699      0 
--     6700   6799      0 
--     6800   6899      0 
--     6900   6999      0 
--     7000   7099      1 
-- Set corMinCoverage=4 based on read coverage of 73521.
--
--                            (tag)Concurrency
--                     (tag)Threads          |
--            (tag)Memory         |          |
--        (tag)         |         |          |     total usage     algorithm
--        -------  ------  --------   --------  -----------------  -----------------------------
-- Local: cor        2 GB    4 CPUs x  16 jobs    32 GB   64 CPUs  (read correction)
--
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'cor' concurrent execution on Thu Apr 19 10:39:08 2018 with 72.396 GB free disk space (30 processes; 16 concurrently)

    cd correction/2-correction
    ./correctReads.sh 1 > ./correctReads.000001.out 2>&1
    ./correctReads.sh 2 > ./correctReads.000002.out 2>&1
    ./correctReads.sh 3 > ./correctReads.000003.out 2>&1
    ./correctReads.sh 4 > ./correctReads.000004.out 2>&1
    ./correctReads.sh 5 > ./correctReads.000005.out 2>&1
    ./correctReads.sh 6 > ./correctReads.000006.out 2>&1
    ./correctReads.sh 7 > ./correctReads.000007.out 2>&1
    ./correctReads.sh 8 > ./correctReads.000008.out 2>&1
    ./correctReads.sh 9 > ./correctReads.000009.out 2>&1
    ./correctReads.sh 10 > ./correctReads.000010.out 2>&1
    ./correctReads.sh 11 > ./correctReads.000011.out 2>&1
    ./correctReads.sh 12 > ./correctReads.000012.out 2>&1
    ./correctReads.sh 13 > ./correctReads.000013.out 2>&1
    ./correctReads.sh 14 > ./correctReads.000014.out 2>&1
    ./correctReads.sh 15 > ./correctReads.000015.out 2>&1
    ./correctReads.sh 16 > ./correctReads.000016.out 2>&1
    ./correctReads.sh 17 > ./correctReads.000017.out 2>&1
    ./correctReads.sh 18 > ./correctReads.000018.out 2>&1
    ./correctReads.sh 19 > ./correctReads.000019.out 2>&1
    ./correctReads.sh 20 > ./correctReads.000020.out 2>&1
    ./correctReads.sh 21 > ./correctReads.000021.out 2>&1
    ./correctReads.sh 22 > ./correctReads.000022.out 2>&1
    ./correctReads.sh 23 > ./correctReads.000023.out 2>&1
    ./correctReads.sh 24 > ./correctReads.000024.out 2>&1
    ./correctReads.sh 25 > ./correctReads.000025.out 2>&1
    ./correctReads.sh 26 > ./correctReads.000026.out 2>&1
    ./correctReads.sh 27 > ./correctReads.000027.out 2>&1
    ./correctReads.sh 28 > ./correctReads.000028.out 2>&1
    ./correctReads.sh 29 > ./correctReads.000029.out 2>&1
    ./correctReads.sh 30 > ./correctReads.000030.out 2>&1

-- Finished on Thu Apr 19 10:39:08 2018 (lickety-split) with 72.396 GB free disk space
----------------------------------------
--
-- Read correction jobs failed, retry.
--   job 2-correction/results/0001.cns FAILED.
--   job 2-correction/results/0002.cns FAILED.
--   job 2-correction/results/0003.cns FAILED.
--   job 2-correction/results/0004.cns FAILED.
--   job 2-correction/results/0005.cns FAILED.
--   job 2-correction/results/0006.cns FAILED.
--   job 2-correction/results/0007.cns FAILED.
--   job 2-correction/results/0008.cns FAILED.
--   job 2-correction/results/0009.cns FAILED.
--   job 2-correction/results/0010.cns FAILED.
--   job 2-correction/results/0011.cns FAILED.
--   job 2-correction/results/0012.cns FAILED.
--   job 2-correction/results/0013.cns FAILED.
--   job 2-correction/results/0014.cns FAILED.
--   job 2-correction/results/0015.cns FAILED.
--   job 2-correction/results/0016.cns FAILED.
--   job 2-correction/results/0017.cns FAILED.
--   job 2-correction/results/0018.cns FAILED.
--   job 2-correction/results/0019.cns FAILED.
--   job 2-correction/results/0020.cns FAILED.
--   job 2-correction/results/0021.cns FAILED.
--   job 2-correction/results/0022.cns FAILED.
--   job 2-correction/results/0023.cns FAILED.
--   job 2-correction/results/0024.cns FAILED.
--   job 2-correction/results/0025.cns FAILED.
--   job 2-correction/results/0026.cns FAILED.
--   job 2-correction/results/0027.cns FAILED.
--   job 2-correction/results/0028.cns FAILED.
--   job 2-correction/results/0029.cns FAILED.
--   job 2-correction/results/0030.cns FAILED.
--
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting 'cor' concurrent execution on Thu Apr 19 10:39:08 2018 with 72.396 GB free disk space (30 processes; 16 concurrently)

    cd correction/2-correction
    ./correctReads.sh 1 > ./correctReads.000001.out 2>&1
    ./correctReads.sh 2 > ./correctReads.000002.out 2>&1
    ./correctReads.sh 3 > ./correctReads.000003.out 2>&1
    ./correctReads.sh 4 > ./correctReads.000004.out 2>&1
    ./correctReads.sh 5 > ./correctReads.000005.out 2>&1
    ./correctReads.sh 6 > ./correctReads.000006.out 2>&1
    ./correctReads.sh 7 > ./correctReads.000007.out 2>&1
    ./correctReads.sh 8 > ./correctReads.000008.out 2>&1
    ./correctReads.sh 9 > ./correctReads.000009.out 2>&1
    ./correctReads.sh 10 > ./correctReads.000010.out 2>&1
    ./correctReads.sh 11 > ./correctReads.000011.out 2>&1
    ./correctReads.sh 12 > ./correctReads.000012.out 2>&1
    ./correctReads.sh 13 > ./correctReads.000013.out 2>&1
    ./correctReads.sh 14 > ./correctReads.000014.out 2>&1
    ./correctReads.sh 15 > ./correctReads.000015.out 2>&1
    ./correctReads.sh 16 > ./correctReads.000016.out 2>&1
    ./correctReads.sh 17 > ./correctReads.000017.out 2>&1
    ./correctReads.sh 18 > ./correctReads.000018.out 2>&1
    ./correctReads.sh 19 > ./correctReads.000019.out 2>&1
    ./correctReads.sh 20 > ./correctReads.000020.out 2>&1
    ./correctReads.sh 21 > ./correctReads.000021.out 2>&1
    ./correctReads.sh 22 > ./correctReads.000022.out 2>&1
    ./correctReads.sh 23 > ./correctReads.000023.out 2>&1
    ./correctReads.sh 24 > ./correctReads.000024.out 2>&1
    ./correctReads.sh 25 > ./correctReads.000025.out 2>&1
    ./correctReads.sh 26 > ./correctReads.000026.out 2>&1
    ./correctReads.sh 27 > ./correctReads.000027.out 2>&1
    ./correctReads.sh 28 > ./correctReads.000028.out 2>&1
    ./correctReads.sh 29 > ./correctReads.000029.out 2>&1
    ./correctReads.sh 30 > ./correctReads.000030.out 2>&1

-- Finished on Thu Apr 19 10:39:08 2018 (lickety-split) with 72.395 GB free disk space
----------------------------------------
--
-- Read correction jobs failed, tried 2 times, giving up.
--   job 2-correction/results/0001.cns FAILED.
--   job 2-correction/results/0002.cns FAILED.
--   job 2-correction/results/0003.cns FAILED.
--   job 2-correction/results/0004.cns FAILED.
--   job 2-correction/results/0005.cns FAILED.
--   job 2-correction/results/0006.cns FAILED.
--   job 2-correction/results/0007.cns FAILED.
--   job 2-correction/results/0008.cns FAILED.
--   job 2-correction/results/0009.cns FAILED.
--   job 2-correction/results/0010.cns FAILED.
--   job 2-correction/results/0011.cns FAILED.
--   job 2-correction/results/0012.cns FAILED.
--   job 2-correction/results/0013.cns FAILED.
--   job 2-correction/results/0014.cns FAILED.
--   job 2-correction/results/0015.cns FAILED.
--   job 2-correction/results/0016.cns FAILED.
--   job 2-correction/results/0017.cns FAILED.
--   job 2-correction/results/0018.cns FAILED.
--   job 2-correction/results/0019.cns FAILED.
--   job 2-correction/results/0020.cns FAILED.
--   job 2-correction/results/0021.cns FAILED.
--   job 2-correction/results/0022.cns FAILED.
--   job 2-correction/results/0023.cns FAILED.
--   job 2-correction/results/0024.cns FAILED.
--   job 2-correction/results/0025.cns FAILED.
--   job 2-correction/results/0026.cns FAILED.
--   job 2-correction/results/0027.cns FAILED.
--   job 2-correction/results/0028.cns FAILED.
--   job 2-correction/results/0029.cns FAILED.
--   job 2-correction/results/0030.cns FAILED.
--

ABORT:
ABORT: Canu 1.7
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
BOM86 commented 6 years ago

Problem solved: there was a folder called "correction" the same folder where my sequences are located. After I successfully run the first error correction I had to manually removing this folder, this solved the problem of the previous error message.

brianwalenz commented 6 years ago

Is it possible you had two canu runs in the same place? The 'correction' directory is made by canu, for storing intermediate correction results - it's possible one run stepped on the other one, or the second run picked up intermediate results from the first run and got confused.

Just seeing if there is something to fix here. (Two runs in the same place isn't something I can easily fix.)

BOM86 commented 6 years ago

Thank you for your fast answer, I was running single canu runs in the same place. The problem was that the intermediate correction results were not deleted after the run completed. After I manually deleted the folder it worked just fine.

skoren commented 6 years ago

Closed, @brianwalenz I assume there is nothing to fix here.