marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

Getting error at correction step #1174

Closed UdiZel closed 5 years ago

UdiZel commented 5 years ago

Hi, I installed canu on a Linux cluster (SLURM) and got an error when running the E.coli (25X) dataset to test the installation.

The command I ran: canu -p ecoli -d /scratch/ez82/Ecoli_PacBio/ -genomeSize=4.8m -pacbio-raw /scratch/ez82/Ecoli_PacBio/pacbio.fastq usegrid=1 gridOptions="--partition=main" gridOptionsJobName=canu_test

The canu.out file looks like this:

   /usr/bin/perl
   This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java:
   /opt/sw/packages/java/1.8.0_73/bin/java
   java version "1.8.0_73"

Found canu:
   /cache/home/ez82/canu/Linux-amd64/bin/canu
   Canu snapshot v1.8 +44 changes (r9254 a50e26a75ffccc529bd944b7adb291e2b6e1c24b)

-- Canu snapshot v1.8 +44 changes (r9254 a50e26a75ffccc529bd944b7adb291e2b6e1c24b)
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
-- 
-- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM.
-- De novo assembly of haplotype-resolved genomes with trio binning.
-- Nat Biotechnol. 2018
-- https//doi.org/10.1038/nbt.4277
-- 
-- Read and contig alignments during correction, consensus and GFA building use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_73' (from '/opt/sw/packages/java/1.8.0_73/bin/java') with -d64 support.
--
-- WARNING:
-- WARNING:  Failed to run gnuplot using command 'gnuplot'.
-- WARNING:  Plots will be disabled.
-- WARNING:
--
-- Detected 24 CPUs and 125 gigabytes of memory.
-- Detected Slurm with 'sinfo' binary in /usr/bin/sinfo.
-- Detected Slurm with 'MaxArraySize' limited to 1000 jobs.
-- 
-- Found 140 hosts with  24 cores and  125 GB memory under Slurm control.
-- Found   1 host  with  48 cores and 1511 GB memory under Slurm control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl     12 GB    4 CPUs  (k-mer counting)
-- Grid:  hap        8 GB    4 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap    6 GB   12 CPUs  (overlap detection with mhap)
-- Grid:  obtovl     4 GB    8 CPUs  (overlap detection)
-- Grid:  utgovl     4 GB    8 CPUs  (overlap detection)
-- Grid:  cor      --- GB    4 CPUs  (read correction)
-- Grid:  ovb        4 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8 GB    1 CPU   (overlap store sorting)
-- Grid:  red        8 GB    4 CPUs  (read error detection)
-- Grid:  oea        4 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       16 GB    4 CPUs  (contig construction with bogart)
-- Grid:  cns      --- GB    4 CPUs  (consensus)
-- Grid:  gfa        8 GB    4 CPUs  (GFA alignment and processing)
--
-- In 'ecoli.seqStore', found PacBio reads:
--   Raw:        12528
--   Corrected:  0
--   Trimmed:    0
--
-- Generating assembly 'ecoli' in '/scratch/ez82/Ecoli_PacBio'
--
-- Parameters:
--
--  genomeSize        4800000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.0450 (  4.50%)
--    utgOvlErrorRate 0.0450 (  4.50%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.0450 (  4.50%)
--    utgErrorRate    0.0450 (  4.50%)
--    cnsErrorRate    0.0750 (  7.50%)
--
--
-- BEGIN CORRECTION
--
--
-- Kmer counting (meryl-count) jobs failed, tried 2 times, giving up.
--   job ecoli.01.meryl FAILED.
--

ABORT:
ABORT: Canu snapshot v1.8 +44 changes (r9254 a50e26a75ffccc529bd944b7adb291e2b6e1c24b)
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

In the correction/0-mercounts folder there are 2 meryl-count.#####_1.out files: meryl- count.82900811_1.out and meryl-count.141639506_1.out below are their content

meryl- count.82900811_1.out:

   /usr/bin/perl
   This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java:
   /opt/sw/packages/java/1.8.0_73/bin/java
   java version "1.8.0_73"

Found canu:
   /cache/home/ez82/canu/Linux-amd64/bin/canu
   Canu snapshot v1.8 +44 changes (r9254 a50e26a75ffccc529bd944b7adb291e2b6e1c24b)

Running job 1 based on SLURM_ARRAY_TASK_ID=1 and offset=0.

Counting 110  million canonical 16-mers from 1 input file:
    canu-seqStore: ../../ecoli.seqStore

SIMPLE MODE
-----------

  16-mers
    -> 4294967296 entries for counts up to 65535.
    -> 64 Gbits memory used

  115899341 input bases
    -> expected max count of 463597, needing 4 extra bits.
    -> 16 Gbits memory used

  10 GB memory needed

COMPLEX MODE
------------

prefix     # of   struct   kmers/    segs/     data    total
  bits   prefix   memory   prefix   prefix   memory   memory
------  -------  -------  -------  -------  -------  -------
     1     2  P    53 kB    55 MM  3427  S   428 MB   428 MB
     2     4  P    52 kB    27 MM  1658  S   414 MB   414 MB
     3     8  P    51 kB    13 MM   802  S   401 MB   401 MB
     4    16  P    50 kB  7073 kM   387  S   387 MB   387 MB
     5    32  P    50 kB  3536 kM   187  S   374 MB   374 MB
     6    64  P    52 kB  1768 kM    90  S   360 MB   360 MB
     7   128  P    58 kB   884 kM    44  S   352 MB   352 MB
     8   256  P    70 kB   442 kM    21  S   336 MB   336 MB
     9   512  P    96 kB   221 kM    10  S   320 MB   320 MB
    10  1024  P   152 kB   110 kM     5  S   320 MB   320 MB  Best Value!
    11  2048  P   272 kB    55 kM     3  S   384 MB   384 MB
    12  4096  P   512 kB    27 kM     2  S   512 MB   512 MB
    13  8192  P   960 kB    13 kM     1  S   512 MB   512 MB
    14    16 kP  1920 kB  7074  M     1  S  1024 MB  1025 MB
    15    32 kP  3840 kB  3537  M     1  S  2048 MB  2051 MB
    16    64 kP  7680 kB  1769  M     1  S  4096 MB  4103 MB
    17   128 kP    15 MB   885  M     1  S  8192 MB  8207 MB

FINAL CONFIGURATION
-------------------

Configured complex mode for 0.313 GB memory per batch, and up to 1 batch.

kmerCountFileWriter()-- Creating './ecoli.01.meryl.WORKING' for 16-mers, with prefixSize 10 suffixSize 22 numFiles 64
Loading kmers from '../../ecoli.seqStore' into buckets.
Used 0.281 GB out of 2.000 GB to store         6149 kmers.
Used 0.406 GB out of 2.000 GB to store     49370316 kmers.
Used 0.531 GB out of 2.000 GB to store     98169708 kmers.

Writing results to './ecoli.01.meryl.WORKING', using 4 threads.
finishIteration()--
Failed to open './ecoli.01.meryl.WORKING/0x011011[001].merylIndex' for writing: No such file or directory

And meryl-count.141639506_1.out:

   /usr/bin/perl
   This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java:
   /opt/sw/packages/java/1.8.0_73/bin/java
   java version "1.8.0_73"

Found canu:
   /cache/home/ez82/canu/Linux-amd64/bin/canu
   Canu snapshot v1.8 +44 changes (r9254 a50e26a75ffccc529bd944b7adb291e2b6e1c24b)

Running job 1 based on SLURM_ARRAY_TASK_ID=1 and offset=0.

Counting 110  million canonical 16-mers from 1 input file:
    canu-seqStore: ../../ecoli.seqStore

SIMPLE MODE
-----------

  16-mers
    -> 4294967296 entries for counts up to 65535.
    -> 64 Gbits memory used

  115899341 input bases
    -> expected max count of 463597, needing 4 extra bits.
    -> 16 Gbits memory used

  10 GB memory needed

COMPLEX MODE
------------

prefix     # of   struct   kmers/    segs/     data    total
  bits   prefix   memory   prefix   prefix   memory   memory
------  -------  -------  -------  -------  -------  -------
     1     2  P    53 kB    55 MM  3427  S   428 MB   428 MB
     2     4  P    52 kB    27 MM  1658  S   414 MB   414 MB
     3     8  P    51 kB    13 MM   802  S   401 MB   401 MB
     4    16  P    50 kB  7073 kM   387  S   387 MB   387 MB
     5    32  P    50 kB  3536 kM   187  S   374 MB   374 MB
     6    64  P    52 kB  1768 kM    90  S   360 MB   360 MB
     7   128  P    58 kB   884 kM    44  S   352 MB   352 MB
     8   256  P    70 kB   442 kM    21  S   336 MB   336 MB
     9   512  P    96 kB   221 kM    10  S   320 MB   320 MB
    10  1024  P   152 kB   110 kM     5  S   320 MB   320 MB  Best Value!
    11  2048  P   272 kB    55 kM     3  S   384 MB   384 MB
    12  4096  P   512 kB    27 kM     2  S   512 MB   512 MB
    13  8192  P   960 kB    13 kM     1  S   512 MB   512 MB
    14    16 kP  1920 kB  7074  M     1  S  1024 MB  1025 MB
    15    32 kP  3840 kB  3537  M     1  S  2048 MB  2051 MB
    16    64 kP  7680 kB  1769  M     1  S  4096 MB  4103 MB
    17   128 kP    15 MB   885  M     1  S  8192 MB  8207 MB

FINAL CONFIGURATION
-------------------

Configured complex mode for 0.313 GB memory per batch, and up to 1 batch.

kmerCountFileWriter()-- Creating './ecoli.01.meryl.WORKING' for 16-mers, with prefixSize 10 suffixSize 22 numFiles 64
Loading kmers from '../../ecoli.seqStore' into buckets.
Used 0.281 GB out of 2.000 GB to store         6149 kmers.
Used 0.406 GB out of 2.000 GB to store     49370316 kmers.
Used 0.531 GB out of 2.000 GB to store     98169708 kmers.

Writing results to './ecoli.01.meryl.WORKING', using 4 threads.
finishIteration()--

Finished counting.
Bye.

Any thoughts on what the problem may be? Thx!

UdiZel commented 5 years ago

Never mind, got it to work!

skoren commented 5 years ago

Glad you got it to work. Please post what you changed to make it run.

UdiZel commented 5 years ago

I think it had something to do with the resources available on our cluster.