marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

Canu Exit code 1, sqStoreCreate failed; boom!. #2331

Closed CheeseLover2020 closed 1 month ago

CheeseLover2020 commented 1 month ago

I am trying to run Canu to assemble some nanopore reads. I have attempted some fixes from other threads but I am a little stumped on how to resolve this. Does anyone have any suggestions? Command:

canu -p C115 -d Assembly_Canu genomesize=34m -nanopore pod5_to_fast5

Output:

-- canu 2.2
--
-- CITATIONS
--
-- For 'standard' assemblies of PacBio or Nanopore reads:
--   Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
--   Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
--   Genome Res. 2017 May;27(5):722-736.
--   http://doi.org/10.1101/gr.215087.116
-- 
-- Read and contig alignments during correction and consensus use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '22.0.1-internal' (from '/home/synbiopc/miniconda3/lib/jvm/bin/java') without -d64 support.
-- Detected gnuplot version '5.4 patchlevel 8   ' (from 'gnuplot') and image format 'png'.
--
-- Detected 32 CPUs and 126 gigabytes of memory on the local machine.
--
-- Local machine mode enabled; grid support not detected or not allowed.
--
--                                (tag)Concurrency
--                         (tag)Threads          |
--                (tag)Memory         |          |
--        (tag)             |         |          |       total usage      algorithm
--        -------  ----------  --------   --------  --------------------  -----------------------------
-- Local: meryl     12.000 GB    4 CPUs x   8 jobs    96.000 GB  32 CPUs  (k-mer counting)
-- Local: hap        8.000 GB    4 CPUs x   8 jobs    64.000 GB  32 CPUs  (read-to-haplotype assignment)
-- Local: cormhap    6.000 GB   16 CPUs x   2 jobs    12.000 GB  32 CPUs  (overlap detection with mhap)
-- Local: obtovl     4.000 GB    8 CPUs x   4 jobs    16.000 GB  32 CPUs  (overlap detection)
-- Local: utgovl     4.000 GB    8 CPUs x   4 jobs    16.000 GB  32 CPUs  (overlap detection)
-- Local: cor        -.--- GB    4 CPUs x   - jobs     -.--- GB   - CPUs  (read correction)
-- Local: ovb        4.000 GB    1 CPU  x  31 jobs   124.000 GB  31 CPUs  (overlap store bucketizer)
-- Local: ovs        8.000 GB    1 CPU  x  15 jobs   120.000 GB  15 CPUs  (overlap store sorting)
-- Local: red       15.000 GB    4 CPUs x   8 jobs   120.000 GB  32 CPUs  (read error detection)
-- Local: oea        8.000 GB    1 CPU  x  15 jobs   120.000 GB  15 CPUs  (overlap error adjustment)
-- Local: bat       16.000 GB    4 CPUs x   1 job     16.000 GB   4 CPUs  (contig construction with bogart)
-- Local: cns        -.--- GB    4 CPUs x   - jobs     -.--- GB   - CPUs  (consensus)
--
-- Found untrimmed raw Nanopore reads in the input files.
--
-- Generating assembly 'C115' in '/var/lib/minknow/data/BDA_C115_LSK14_LFB/BDA_C115/20240715_1444_MN35570_FAZ49880_b7f50137/pod5/Assembly_Canu':
--   genomeSize:
--     34000000
--
--   Overlap Generation Limits:
--     corOvlErrorRate 0.3200 ( 32.00%)
--     obtOvlErrorRate 0.1200 ( 12.00%)
--     utgOvlErrorRate 0.1200 ( 12.00%)
--
--   Overlap Processing Limits:
--     corErrorRate    0.3000 ( 30.00%)
--     obtErrorRate    0.1200 ( 12.00%)
--     utgErrorRate    0.1200 ( 12.00%)
--     cnsErrorRate    0.2000 ( 20.00%)
--
--   Stages to run:
--     correct raw reads.
--     trim corrected reads.
--     assemble corrected and trimmed reads.
--
--
-- BEGIN CORRECTION
----------------------------------------
-- Starting command on Tue Jul 23 16:14:37 2024 with 2831.553 GB free disk space

    cd .
    ./C115.seqStore.sh \
    > ./C115.seqStore.err 2>&1

-- Finished on Tue Jul 23 16:14:37 2024 (like a bat out of hell) with 2831.553 GB free disk space
----------------------------------------

ERROR:
ERROR:  Failed with exit code 1.  (rc=256)
ERROR:

ABORT:
ABORT: canu 2.2
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
ABORT:   sqStoreCreate failed; boom!.
ABORT:
ABORT: Disk space available:  2831.553 GB
ABORT:
ABORT: Last 50 lines of the relevant log file (./C115.seqStore.err):
ABORT:
ABORT:   
ABORT:   Found canu:
ABORT:      /home/synbiopc/miniconda3/bin/canu
ABORT:      canu 2.2
ABORT:   
ABORT:   usage: /home/synbiopc/miniconda3/bin/sqStoreCreate -o S.seqStore
ABORT:                                                      [options] \
ABORT:                                                      [[processing-options] technology-option libName reads ...] ...
ABORT:     -o S.seqStore          create output S.seqStore and load reads into it
ABORT:   
ABORT:     -minlength L           discard reads shorter than L (regardless of coverage)
ABORT:   
ABORT:     -homopolycompress      set up for accessing homopolymer compressed reads
ABORT:                            by default; also compute coverage and filter lengths
ABORT:                            using the compressed read sequence.
ABORT:   
ABORT:   COVERAGE FILTERING
ABORT:     When more than C coverage in reads is supplied, random reads are removed
ABORT:     until coverage is C.  Bias B will remove shorter (B > 0) or longer (B < 0)
ABORT:     reads preferentially.  B=0 will remove random reads.  Default is B=1.
ABORT:   
ABORT:     -genomesize G          expected genome size (needed to compute coverage)
ABORT:     -coverage C            desired coverage in long reads
ABORT:     -bias B                remove shorter (B > 0) or longer (B < 0) reads.
ABORT:     -seed S                seed the pseudo random number generator with S
ABORT:                              1 <= S <= 4294967295
ABORT:                              S = 0 will use a seed derived from the time and process id
ABORT:   
ABORT:   READ SPECIFICATION
ABORT:     Reads are supplied as a collection of libraries.  Each library should contain
ABORT:     all the reads from one sequencing experiment (e.g., sample collection, sample
ABORT:     preperation, sequencing run).  A library is created when any of the 'read
ABORT:     technology' options is encountered, and will use whatever 'processing state'
ABORT:     have been already supplied.  The first word after a 'read technology' option
ABORT:     must be the name of the library.
ABORT:   
ABORT:     Note that -pacbio-hifi will force -corrected status.
ABORT:   
ABORT:     Example:  '-raw -pacbio LIBRARY_1 file.fasta'
ABORT:   
ABORT:     -raw                   set the 'processing state' of the reads
ABORT:     -corrected               next on the command line.
ABORT:     -untrimmed
ABORT:     -trimmed
ABORT:   
ABORT:     -nanopore              set the 'read technology' of the reads
ABORT:     -pacbio                  next on the command line.
ABORT:     -pacbio-hifi
ABORT:   
ABORT:   Option -nanopore: file '/var/lib/minknow/data/BDA_C115_LSK14_LFB/BDA_C115/20240715_1444_MN35570_FAZ49880_b7f50137/pod5/pod5_to_fast5' not found.
ABORT:
brianwalenz commented 1 month ago

Is pod5_to_fast5 a directory?

Canu will only accept FASTA or FASTQ files (gz, bz2 or xz compressed).

CheeseLover2020 commented 1 month ago

It is a directory yes, I have a large number of fast5 files (the output from my ONT run). I am currently running a python script to attempt to merge these into one file that I can give to canu

skoren commented 1 month ago

You can't provide fast5 files as input to canu, it can only read fasta or fastq (gz/bz2/xz compression is ok). Multiple files as input is also OK, provided they don't exceed the command line length. You have to convert your fast5 files to fasta/q before running canu.

CheeseLover2020 commented 1 month ago

Thank you! Just to follow up, using the merged files and using fastq instead of fastq.gz I was able to run Canu.