Hi, I installed canu on a Linux cluster (SLURM) and got an error when running the E.coli (25X) dataset to test the installation.
The command I ran:
canu -p ecoli -d /scratch/ez82/Ecoli_PacBio/ -genomeSize=4.8m -pacbio-raw /scratch/ez82/Ecoli_PacBio/pacbio.fastq usegrid=1 gridOptions="--partition=main" gridOptionsJobName=canu_test
The canu.out file looks like this:
/usr/bin/perl
This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
Found java:
/opt/sw/packages/java/1.8.0_73/bin/java
java version "1.8.0_73"
Found canu:
/cache/home/ez82/canu/Linux-amd64/bin/canu
Canu snapshot v1.8 +44 changes (r9254 a50e26a75ffccc529bd944b7adb291e2b6e1c24b)
-- Canu snapshot v1.8 +44 changes (r9254 a50e26a75ffccc529bd944b7adb291e2b6e1c24b)
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
--
-- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM.
-- De novo assembly of haplotype-resolved genomes with trio binning.
-- Nat Biotechnol. 2018
-- https//doi.org/10.1038/nbt.4277
--
-- Read and contig alignments during correction, consensus and GFA building use:
-- Šošic M, Šikic M.
-- Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
-- Bioinformatics. 2017 May 1;33(9):1394-1395.
-- http://doi.org/10.1093/bioinformatics/btw753
--
-- Overlaps are generated using:
-- Berlin K, et al.
-- Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
-- Nat Biotechnol. 2015 Jun;33(6):623-30.
-- http://doi.org/10.1038/nbt.3238
--
-- Myers EW, et al.
-- A Whole-Genome Assembly of Drosophila.
-- Science. 2000 Mar 24;287(5461):2196-204.
-- http://doi.org/10.1126/science.287.5461.2196
--
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
-- Chin CS, et al.
-- Phased diploid genome assembly with single-molecule real-time sequencing.
-- Nat Methods. 2016 Dec;13(12):1050-1054.
-- http://doi.org/10.1038/nmeth.4035
--
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
-- Chin CS, et al.
-- Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
-- Nat Methods. 2013 Jun;10(6):563-9
-- http://doi.org/10.1038/nmeth.2474
--
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_73' (from '/opt/sw/packages/java/1.8.0_73/bin/java') with -d64 support.
--
-- WARNING:
-- WARNING: Failed to run gnuplot using command 'gnuplot'.
-- WARNING: Plots will be disabled.
-- WARNING:
--
-- Detected 24 CPUs and 125 gigabytes of memory.
-- Detected Slurm with 'sinfo' binary in /usr/bin/sinfo.
-- Detected Slurm with 'MaxArraySize' limited to 1000 jobs.
--
-- Found 140 hosts with 24 cores and 125 GB memory under Slurm control.
-- Found 1 host with 48 cores and 1511 GB memory under Slurm control.
--
-- (tag)Threads
-- (tag)Memory |
-- (tag) | | algorithm
-- ------- ------ -------- -----------------------------
-- Grid: meryl 12 GB 4 CPUs (k-mer counting)
-- Grid: hap 8 GB 4 CPUs (read-to-haplotype assignment)
-- Grid: cormhap 6 GB 12 CPUs (overlap detection with mhap)
-- Grid: obtovl 4 GB 8 CPUs (overlap detection)
-- Grid: utgovl 4 GB 8 CPUs (overlap detection)
-- Grid: cor --- GB 4 CPUs (read correction)
-- Grid: ovb 4 GB 1 CPU (overlap store bucketizer)
-- Grid: ovs 8 GB 1 CPU (overlap store sorting)
-- Grid: red 8 GB 4 CPUs (read error detection)
-- Grid: oea 4 GB 1 CPU (overlap error adjustment)
-- Grid: bat 16 GB 4 CPUs (contig construction with bogart)
-- Grid: cns --- GB 4 CPUs (consensus)
-- Grid: gfa 8 GB 4 CPUs (GFA alignment and processing)
--
-- In 'ecoli.seqStore', found PacBio reads:
-- Raw: 12528
-- Corrected: 0
-- Trimmed: 0
--
-- Generating assembly 'ecoli' in '/scratch/ez82/Ecoli_PacBio'
--
-- Parameters:
--
-- genomeSize 4800000
--
-- Overlap Generation Limits:
-- corOvlErrorRate 0.2400 ( 24.00%)
-- obtOvlErrorRate 0.0450 ( 4.50%)
-- utgOvlErrorRate 0.0450 ( 4.50%)
--
-- Overlap Processing Limits:
-- corErrorRate 0.3000 ( 30.00%)
-- obtErrorRate 0.0450 ( 4.50%)
-- utgErrorRate 0.0450 ( 4.50%)
-- cnsErrorRate 0.0750 ( 7.50%)
--
--
-- BEGIN CORRECTION
--
--
-- Kmer counting (meryl-count) jobs failed, tried 2 times, giving up.
-- job ecoli.01.meryl FAILED.
--
ABORT:
ABORT: Canu snapshot v1.8 +44 changes (r9254 a50e26a75ffccc529bd944b7adb291e2b6e1c24b)
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting. If that doesn't work, ask for help.
ABORT:
In the correction/0-mercounts folder there are 2 meryl-count.#####_1.out files: meryl- count.82900811_1.out and meryl-count.141639506_1.out below are their content
meryl- count.82900811_1.out:
/usr/bin/perl
This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
Found java:
/opt/sw/packages/java/1.8.0_73/bin/java
java version "1.8.0_73"
Found canu:
/cache/home/ez82/canu/Linux-amd64/bin/canu
Canu snapshot v1.8 +44 changes (r9254 a50e26a75ffccc529bd944b7adb291e2b6e1c24b)
Running job 1 based on SLURM_ARRAY_TASK_ID=1 and offset=0.
Counting 110 million canonical 16-mers from 1 input file:
canu-seqStore: ../../ecoli.seqStore
SIMPLE MODE
-----------
16-mers
-> 4294967296 entries for counts up to 65535.
-> 64 Gbits memory used
115899341 input bases
-> expected max count of 463597, needing 4 extra bits.
-> 16 Gbits memory used
10 GB memory needed
COMPLEX MODE
------------
prefix # of struct kmers/ segs/ data total
bits prefix memory prefix prefix memory memory
------ ------- ------- ------- ------- ------- -------
1 2 P 53 kB 55 MM 3427 S 428 MB 428 MB
2 4 P 52 kB 27 MM 1658 S 414 MB 414 MB
3 8 P 51 kB 13 MM 802 S 401 MB 401 MB
4 16 P 50 kB 7073 kM 387 S 387 MB 387 MB
5 32 P 50 kB 3536 kM 187 S 374 MB 374 MB
6 64 P 52 kB 1768 kM 90 S 360 MB 360 MB
7 128 P 58 kB 884 kM 44 S 352 MB 352 MB
8 256 P 70 kB 442 kM 21 S 336 MB 336 MB
9 512 P 96 kB 221 kM 10 S 320 MB 320 MB
10 1024 P 152 kB 110 kM 5 S 320 MB 320 MB Best Value!
11 2048 P 272 kB 55 kM 3 S 384 MB 384 MB
12 4096 P 512 kB 27 kM 2 S 512 MB 512 MB
13 8192 P 960 kB 13 kM 1 S 512 MB 512 MB
14 16 kP 1920 kB 7074 M 1 S 1024 MB 1025 MB
15 32 kP 3840 kB 3537 M 1 S 2048 MB 2051 MB
16 64 kP 7680 kB 1769 M 1 S 4096 MB 4103 MB
17 128 kP 15 MB 885 M 1 S 8192 MB 8207 MB
FINAL CONFIGURATION
-------------------
Configured complex mode for 0.313 GB memory per batch, and up to 1 batch.
kmerCountFileWriter()-- Creating './ecoli.01.meryl.WORKING' for 16-mers, with prefixSize 10 suffixSize 22 numFiles 64
Loading kmers from '../../ecoli.seqStore' into buckets.
Used 0.281 GB out of 2.000 GB to store 6149 kmers.
Used 0.406 GB out of 2.000 GB to store 49370316 kmers.
Used 0.531 GB out of 2.000 GB to store 98169708 kmers.
Writing results to './ecoli.01.meryl.WORKING', using 4 threads.
finishIteration()--
Failed to open './ecoli.01.meryl.WORKING/0x011011[001].merylIndex' for writing: No such file or directory
And meryl-count.141639506_1.out:
/usr/bin/perl
This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
Found java:
/opt/sw/packages/java/1.8.0_73/bin/java
java version "1.8.0_73"
Found canu:
/cache/home/ez82/canu/Linux-amd64/bin/canu
Canu snapshot v1.8 +44 changes (r9254 a50e26a75ffccc529bd944b7adb291e2b6e1c24b)
Running job 1 based on SLURM_ARRAY_TASK_ID=1 and offset=0.
Counting 110 million canonical 16-mers from 1 input file:
canu-seqStore: ../../ecoli.seqStore
SIMPLE MODE
-----------
16-mers
-> 4294967296 entries for counts up to 65535.
-> 64 Gbits memory used
115899341 input bases
-> expected max count of 463597, needing 4 extra bits.
-> 16 Gbits memory used
10 GB memory needed
COMPLEX MODE
------------
prefix # of struct kmers/ segs/ data total
bits prefix memory prefix prefix memory memory
------ ------- ------- ------- ------- ------- -------
1 2 P 53 kB 55 MM 3427 S 428 MB 428 MB
2 4 P 52 kB 27 MM 1658 S 414 MB 414 MB
3 8 P 51 kB 13 MM 802 S 401 MB 401 MB
4 16 P 50 kB 7073 kM 387 S 387 MB 387 MB
5 32 P 50 kB 3536 kM 187 S 374 MB 374 MB
6 64 P 52 kB 1768 kM 90 S 360 MB 360 MB
7 128 P 58 kB 884 kM 44 S 352 MB 352 MB
8 256 P 70 kB 442 kM 21 S 336 MB 336 MB
9 512 P 96 kB 221 kM 10 S 320 MB 320 MB
10 1024 P 152 kB 110 kM 5 S 320 MB 320 MB Best Value!
11 2048 P 272 kB 55 kM 3 S 384 MB 384 MB
12 4096 P 512 kB 27 kM 2 S 512 MB 512 MB
13 8192 P 960 kB 13 kM 1 S 512 MB 512 MB
14 16 kP 1920 kB 7074 M 1 S 1024 MB 1025 MB
15 32 kP 3840 kB 3537 M 1 S 2048 MB 2051 MB
16 64 kP 7680 kB 1769 M 1 S 4096 MB 4103 MB
17 128 kP 15 MB 885 M 1 S 8192 MB 8207 MB
FINAL CONFIGURATION
-------------------
Configured complex mode for 0.313 GB memory per batch, and up to 1 batch.
kmerCountFileWriter()-- Creating './ecoli.01.meryl.WORKING' for 16-mers, with prefixSize 10 suffixSize 22 numFiles 64
Loading kmers from '../../ecoli.seqStore' into buckets.
Used 0.281 GB out of 2.000 GB to store 6149 kmers.
Used 0.406 GB out of 2.000 GB to store 49370316 kmers.
Used 0.531 GB out of 2.000 GB to store 98169708 kmers.
Writing results to './ecoli.01.meryl.WORKING', using 4 threads.
finishIteration()--
Finished counting.
Bye.
Hi, I installed canu on a Linux cluster (SLURM) and got an error when running the E.coli (25X) dataset to test the installation.
The command I ran:
canu -p ecoli -d /scratch/ez82/Ecoli_PacBio/ -genomeSize=4.8m -pacbio-raw /scratch/ez82/Ecoli_PacBio/pacbio.fastq usegrid=1 gridOptions="--partition=main" gridOptionsJobName=canu_test
The
canu.out
file looks like this:In the
correction/0-mercounts
folder there are 2meryl-count.#####_1.out
files:meryl- count.82900811_1.out
andmeryl-count.141639506_1.out
below are their contentmeryl- count.82900811_1.out
:And
meryl-count.141639506_1.out
:Any thoughts on what the problem may be? Thx!