marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
653 stars 179 forks source link

Failed with exit code 1. (rc=256)_sqStoreCreate failed; boom! #2154

Closed BioHlT closed 2 years ago

BioHlT commented 2 years ago

Hello I am having an issue with canu assembly this is the command I used canu -p Vanella_contigs -d Canu_output genomeSize=50m -nanopore YT30_vanella_ont.fastq -pacbio Vanella_YT30.subreads.fastq.gz the error is saying that the files are not found, although the files are located in the same folder I am running canu in. I am at loss here, can someone help me resolve this issue? I have attached the error message below. kind regards

-- canu 2.2
--
-- CITATIONS
--
-- For 'standard' assemblies of PacBio or Nanopore reads:
--   Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
--   Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
--   Genome Res. 2017 May;27(5):722-736.
--   http://doi.org/10.1101/gr.215087.116
-- 
-- Read and contig alignments during correction and consensus use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_231' (from 'java') with -d64 support.
--
-- WARNING:
-- WARNING:  Failed to run gnuplot using command 'gnuplot'.
-- WARNING:  Plots will be disabled.
-- WARNING:
--
--
-- Detected 28 CPUs and 128 gigabytes of memory on the local machine.
--
-- Local machine mode enabled; grid support not detected or not allowed.
--
--                                (tag)Concurrency
--                         (tag)Threads          |
--                (tag)Memory         |          |
--        (tag)             |         |          |       total usage      algorithm
--        -------  ----------  --------   --------  --------------------  -----------------------------
-- Local: meryl     12.000 GB    4 CPUs x   7 jobs    84.000 GB  28 CPUs  (k-mer counting)
-- Local: hap        8.000 GB    4 CPUs x   7 jobs    56.000 GB  28 CPUs  (read-to-haplotype assignment)
-- Local: cormhap   13.000 GB   14 CPUs x   2 jobs    26.000 GB  28 CPUs  (overlap detection with mhap)
-- Local: obtovl     8.000 GB    7 CPUs x   4 jobs    32.000 GB  28 CPUs  (overlap detection)
-- Local: utgovl     8.000 GB    7 CPUs x   4 jobs    32.000 GB  28 CPUs  (overlap detection)
-- Local: cor        -.--- GB    4 CPUs x   - jobs     -.--- GB   - CPUs  (read correction)
-- Local: ovb        4.000 GB    1 CPU  x  28 jobs   112.000 GB  28 CPUs  (overlap store bucketizer)
-- Local: ovs        8.000 GB    1 CPU  x  16 jobs   128.000 GB  16 CPUs  (overlap store sorting)
-- Local: red       16.000 GB    4 CPUs x   7 jobs   112.000 GB  28 CPUs  (read error detection)
-- Local: oea        8.000 GB    1 CPU  x  16 jobs   128.000 GB  16 CPUs  (overlap error adjustment)
-- Local: bat       64.000 GB    8 CPUs x   1 job     64.000 GB   8 CPUs  (contig construction with bogart)
-- Local: cns        -.--- GB    8 CPUs x   - jobs     -.--- GB   - CPUs  (consensus)
--
-- Found untrimmed raw PacBio CLR and Nanopore reads in the input files.
--
-- Generating assembly 'Vanella_contigs' in '/Volumes/2_GENOME_DATA 1/Vanella/02_Canu/Canu_output':
--   genomeSize:
--     50000000
--
--   Overlap Generation Limits:
--     corOvlErrorRate 0.3200 ( 32.00%)
--     obtOvlErrorRate 0.1200 ( 12.00%)
--     utgOvlErrorRate 0.1200 ( 12.00%)
--
--   Overlap Processing Limits:
--     corErrorRate    0.3000 ( 30.00%)
--     obtErrorRate    0.1200 ( 12.00%)
--     utgErrorRate    0.1200 ( 12.00%)
--     cnsErrorRate    0.2000 ( 20.00%)
--
--   Stages to run:
--     correct raw reads.
--     trim corrected reads.
--     assemble corrected and trimmed reads.
--
--
-- BEGIN CORRECTION
----------------------------------------
-- Starting command on Thu Aug 11 12:06:20 2022 with 2574.511 GB free disk space

    cd .
    ./Vanella_contigs.seqStore.sh \
    > ./Vanella_contigs.seqStore.err 2>&1

-- Finished on Thu Aug 11 12:06:20 2022 (lickety-split) with 2574.511 GB free disk space
----------------------------------------

ERROR:
ERROR:  Failed with exit code 1.  (rc=256)
ERROR:

ABORT:
ABORT: canu 2.2
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
ABORT:   sqStoreCreate failed; boom!.
ABORT:
ABORT: Disk space available:  2574.511 GB
ABORT:
ABORT: Last 50 lines of the relevant log file (./Vanella_contigs.seqStore.err):
ABORT:
ABORT:      canu 2.2
ABORT:   
ABORT:   usage: /Users/yonastekle/Documents/Softwares/canu-2.2/bin/sqStoreCreate -o S.seqStore
ABORT:                                                                           [options] \
ABORT:                                                                           [[processing-options] technology-option libName reads ...] ...
ABORT:     -o S.seqStore          create output S.seqStore and load reads into it
ABORT:   
ABORT:     -minlength L           discard reads shorter than L (regardless of coverage)
ABORT:   
ABORT:     -homopolycompress      set up for accessing homopolymer compressed reads
ABORT:                            by default; also compute coverage and filter lengths
ABORT:                            using the compressed read sequence.
ABORT:   
ABORT:   COVERAGE FILTERING
ABORT:     When more than C coverage in reads is supplied, random reads are removed
ABORT:     until coverage is C.  Bias B will remove shorter (B > 0) or longer (B < 0)
ABORT:     reads preferentially.  B=0 will remove random reads.  Default is B=1.
ABORT:   
ABORT:     -genomesize G          expected genome size (needed to compute coverage)
ABORT:     -coverage C            desired coverage in long reads
ABORT:     -bias B                remove shorter (B > 0) or longer (B < 0) reads.
ABORT:     -seed S                seed the pseudo random number generator with S
ABORT:                              1 <= S <= 4294967295
ABORT:                              S = 0 will use a seed derived from the time and process id
ABORT:   
ABORT:   READ SPECIFICATION
ABORT:     Reads are supplied as a collection of libraries.  Each library should contain
ABORT:     all the reads from one sequencing experiment (e.g., sample collection, sample
ABORT:     preperation, sequencing run).  A library is created when any of the 'read
ABORT:     technology' options is encountered, and will use whatever 'processing state'
ABORT:     have been already supplied.  The first word after a 'read technology' option
ABORT:     must be the name of the library.
ABORT:   
ABORT:     Note that -pacbio-hifi will force -corrected status.
ABORT:   
ABORT:     Example:  '-raw -pacbio LIBRARY_1 file.fasta'
ABORT:   
ABORT:     -raw                   set the 'processing state' of the reads
ABORT:     -corrected               next on the command line.
ABORT:     -untrimmed
ABORT:     -trimmed
ABORT:   
ABORT:     -nanopore              set the 'read technology' of the reads
ABORT:     -pacbio                  next on the command line.
ABORT:     -pacbio-hifi
ABORT:   
ABORT:   Option -nanopore: file '/Volumes/2_GENOME_DATA' not found.
ABORT:   Option -nanopore: file '1/Vanella/02_Canu/YT30_vanella_ont.fastq' not found.
ABORT:   Option -pacbio: file '/Volumes/2_GENOME_DATA' not found.
ABORT:   Option -pacbio: file '1/Vanella/02_Canu/Vanella_YT30.subreads.fastq.gz' not found.
brianwalenz commented 2 years ago

I'm guessing that canu isn't happy with the space in "2_GENOME_DATA 1".

Canu is using the full path to the input files ("/Volumes/2_GENOME_DATA 1/Vanella/02_Canu/...") but when it goes to access the files the name isn't used quoted and the space causes the one file to look like two files (as shown in the two pairs of "not found" errors).

The easiest fix is to rename that volume to remove the space. Alternatively, you can manually edit Vanella_contigs.seqStore.sh to add quote marks around the input files, then run that script by hand and restart canu. Canu will notice that the step is finished and continue on.

BioHlT commented 2 years ago

Thank you for the easy solve!

ykc21e8 commented 1 year ago

"Alternatively, you can manually edit Vanella_contigs.seqStore.sh to add quote marks around the input files, then run that script by hand and restart canu. Canu will notice that the step is finished and continue on." How do you do that?

I had a similar issue but after correcting the space, the correction isn't recognised by Canu.

ykc21e8 commented 1 year ago

Oh, actually I fixed the issue. Canu could not recognise the fix because it keeps reading the previously generated .err file. Deleting the seqStore.err file allowed Canu to move on with the program.