marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
658 stars 179 forks source link

Mhap precompute jobs failed #857

Closed AndresICM closed 6 years ago

AndresICM commented 6 years ago

I'm running Canu v1.6 on linux through the ont-assembly-polish pipeline (https://github.com/nanoporetech/ont-assembly-polish), on a remote server. The only Canu parameter I changed was useGrid=false. My genome size is around 7m. I get the following output, and I can't figure out what's the issue.

 Assembling nanopore reads using canu.
-- Canu snapshot v1.6 +256 changes (r8668 e313d3c3ab7cb79e43e2df1a53571bbcbce3ee15)
.
.
.
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_161' (from 'java').
-- Detected gnuplot version '5.2 patchlevel 2' (from '/N/dc2/scratch/rorellan/ACumsille/Softwares/gnuplot/gnuplot-5.2.2/bin/gnuplot') and image format 'png'.
-- Detected 16 CPUs and 63 gigabytes of memory.
-- Detected PBS/Torque '6.1.1.1' with 'pbsnodes' binary in /usr/local/bin/pbsnodes.
-- Grid engine disabled per useGrid=false option.
--
--                            (tag)Concurrency
--                     (tag)Threads          |
--            (tag)Memory         |          |
--        (tag)         |         |          |  algorithm
--        -------  ------  --------   --------  -----------------------------
-- Local: meryl      8 GB    4 CPUs x   4 jobs  (k-mer counting)
-- Local: cormhap    6 GB   16 CPUs x   1 job   (overlap detection with mhap)
-- Local: obtovl     8 GB    8 CPUs x   2 jobs  (overlap detection)
-- Local: utgovl     8 GB    8 CPUs x   2 jobs  (overlap detection)
-- Local: ovb        3 GB    1 CPU  x  16 jobs  (overlap store bucketizer)
-- Local: ovs        8 GB    1 CPU  x  16 jobs  (overlap store sorting)
-- Local: red        2 GB    4 CPUs x   4 jobs  (read error detection)
-- Local: oea        1 GB    1 CPU  x  16 jobs  (overlap error adjustment)
-- Local: bat       15 GB    4 CPUs x   4 jobs  (contig construction)
-- Local: cns       15 GB    4 CPUs x   4 jobs  (consensus)
-- Local: gfa        8 GB    4 CPUs x   4 jobs  (GFA alignment and processing)
--
-- Found Nanopore uncorrected reads in the input files.
--
-- Generating assembly 'canu' in '/Folder'
--
-- Parameters:
--
--  genomeSize        7000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1440 ( 14.40%)
--    utgOvlErrorRate 0.1440 ( 14.40%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1440 ( 14.40%)
--    utgErrorRate    0.1440 ( 14.40%)
--    cnsErrorRate    0.1920 ( 19.20%)
--
--
-- BEGIN CORRECTION
--
----------------------------------------
-- Starting command on Thu Apr  5 14:40:41 2018 with 1136587.857 GB free disk space

    cd .
    /Softwares/canu/Linux-amd64/bin/gatekeeperCreate \
      -minlength 1000 \
      -o ./canu.gkpStore.BUILDING \
      ./canu.gkpStore.gkp \
    > ./canu.gkpStore.BUILDING.err 2>&1

-- Finished on Thu Apr  5 14:40:43 2018 (2 seconds) with 1136588.029 GB free disk space
----------------------------------------
--
-- In gatekeeper store './canu.gkpStore':
--   Found 21064 reads.
--   Found 221204647 bases (31.6 times coverage).
--
--   Read length histogram (one '*' equals 105.78 reads):
--        0   4999   7405 **********************************************************************
--     5000   9999   5573 ****************************************************
--    10000  14999   3330 *******************************
--    15000  19999   1921 ******************
--    20000  24999   1106 **********
--    25000  29999    695 ******
--    30000  34999    407 ***
--    35000  39999    232 **
--    40000  44999    137 *
--    45000  49999     90 
--    50000  54999     67 
--    55000  59999     39 
--    60000  64999     23 
--    65000  69999     20 
--    70000  74999      9 
--    75000  79999      2 
--    80000  84999      2 
--    85000  89999      3 
--    90000  94999      1 
--    95000  99999      0 
--   100000 104999      1 
--   105000 109999      0 
--   110000 114999      1 
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Thu Apr  5 14:40:44 2018 with 1136588.029 GB free disk space (2 processes; 1 concurrently)

    cd correction/1-overlapper
    ./precompute.sh 1 > ./precompute.000001.out 2>&1
    ./precompute.sh 2 > ./precompute.000002.out 2>&1

-- Finished on Thu Apr  5 14:40:44 2018 (lickety-split) with 1136587.2 GB free disk space
----------------------------------------
--
-- Mhap precompute jobs failed, retry.
--   job correction/1-overlapper/blocks/000001.dat FAILED.
--   job correction/1-overlapper/blocks/000002.dat FAILED.
--
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Thu Apr  5 14:40:44 2018 with 1136587.2 GB free disk space (2 processes; 1 concurrently)

    cd correction/1-overlapper
    ./precompute.sh 1 > ./precompute.000001.out 2>&1
    ./precompute.sh 2 > ./precompute.000002.out 2>&1

-- Finished on Thu Apr  5 14:40:44 2018 (lickety-split) with 1136587.2 GB free disk space
----------------------------------------
--
-- Mhap precompute jobs failed, tried 2 times, giving up.
--   job correction/1-overlapper/blocks/000001.dat FAILED.
--   job correction/1-overlapper/blocks/000002.dat FAILED.
--

ABORT:
ABORT: Canu snapshot v1.6 +256 changes (r8668 e313d3c3ab7cb79e43e2df1a53571bbcbce3ee15)
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
skoren commented 6 years ago

It looks like you're using an intermediate between 1.6 and 1.7, not a release. I would suggest switching to 1.7 if you can.

As for the error, it's most likely a java issue, what's the output in correction/1-overlapper/precompute.000001.out. Also, how much memory did you reserve for the job when you submitted it to the remote server?

AndresICM commented 6 years ago

Thanks for the quick response. that's the output

Running job 1 based on command line options. /gpfs/hps/soft/rhel7/canu/1.6/Linux-amd64/bin/gatekeeperDumpFASTQ: /lib64/libc.so.6: version GLIBC_2.14' not found (required by /gpfs/hps/soft/rhel7/canu/1.6/Linux-amd64/bin/gatekeeperDumpFASTQ) mv: cannot stat./blocks/000001.input.fasta': No such file or directory Failed to extract fasta.

skoren commented 6 years ago

Ah, it's a library/linking error not a Canu error. Looks like the machine you're running on doesn't have the same library versions as the machine you compiled on but it's strange it didn't fail right away. Perhaps the compilation is corrupted and mixes two library versions. What does ldd /gpfs/hps/soft/rhel7/canu/1.6/Linux-amd64/bin/gatekeeperDumpFASTQ and ldd /gpfs/hps/soft/rhel7/canu/1.6/Linux-amd64/bin/gatekeeperCreate report?

Most likely you need to recompile after removing the binary directory or you could use the pre-compiled binaries from the release page.

AndresICM commented 6 years ago

I'm not sure of what do you mean or how to do that. gatekeeperDumpFASTQ and gatekeeperCreate shows a lot of some random characters. Don't know if there's any use of pasting them here.

skoren commented 6 years ago

Basically, whoever installed Canu didn't compile it correctly. ldd should be telling you what a program is linked against like this:

% ldd canu/Linux-amd64/bin/gatekeeperDumpFASTQ 
    linux-vdso.so.1 =>  (0x00007ffca9bbe000)
    libstdc++.so.6 => /opt/sw/software/gcc/4.8.5/lib64/libstdc++.so.6 (0x00007ff8ef09d000)
    libm.so.6 => /lib64/libm.so.6 (0x00000039fce00000)
    libgomp.so.1 => /opt/sw/software/gcc/4.8.5/lib64/libgomp.so.1 (0x00007ff8eee7f000)
    libgcc_s.so.1 => /opt/sw/software/gcc/4.8.5/lib64/libgcc_s.so.1 (0x00007ff8eec68000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00000039fd200000)
    libc.so.6 => /lib64/libc.so.6 (0x00000039fca00000)
    /lib64/ld-linux-x86-64.so.2 (0x00000039fc600000)
    librt.so.1 => /lib64/librt.so.1 (0x00000039fda00000)

so I wanted to see if the binaries are linked to the same library or not. You could just try downloading the release binaries instead and run with those.

skoren commented 6 years ago

Idle, library/installation issue.

mastermindchr commented 6 years ago

I also keep having this problem trying to test canu. Note that i'm a new user both in canu and linux. I downgraded Java to 8 and i installed canu 8 but as you can see from the folder it is named 1.8 for some reason. here is what i get

./canu-1.8/*/bin/canu  -p ecoli -d ecoli-oxford  genomeSize=4.8m  -nanopore-raw oxford.fasta useGrid=false
-- Canu 1.8
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
-- 
-- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM.
-- De novo assembly of haplotype-resolved genomes with trio binning.
-- Nat Biotechnol. 2018
-- https//doi.org/10.1038/nbt.4277
-- 
-- Read and contig alignments during correction, consensus and GFA building use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_191' (from '/usr/lib/jvm/java-8-oracle/bin/java') with -d64 support.
-- Detected gnuplot version '5.2 patchlevel 2   ' (from 'gnuplot') and image format 'png'.
-- Detected 4 CPUs and 8 gigabytes of memory.
-- Detected Slurm with 'sinfo' binary in /usr/local/bin/sinfo.
-- Grid engine disabled per useGrid=false option.
--
--                            (tag)Concurrency
--                     (tag)Threads          |
--            (tag)Memory         |          |
--        (tag)         |         |          |     total usage     algorithm
--        -------  ------  --------   --------  -----------------  -----------------------------
-- Local: meryl      8 GB    4 CPUs x   1 job      8 GB    4 CPUs  (k-mer counting)
-- Local: hap        8 GB    4 CPUs x   1 job      8 GB    4 CPUs  (read-to-haplotype assignment)
-- Local: cormhap    6 GB    4 CPUs x   1 job      6 GB    4 CPUs  (overlap detection with mhap)
-- Local: obtovl     4 GB    4 CPUs x   1 job      4 GB    4 CPUs  (overlap detection)
-- Local: utgovl     4 GB    4 CPUs x   1 job      4 GB    4 CPUs  (overlap detection)
-- Local: ovb        4 GB    1 CPU  x   2 jobs     8 GB    2 CPUs  (overlap store bucketizer)
-- Local: ovs        8 GB    1 CPU  x   1 job      8 GB    1 CPU   (overlap store sorting)
-- Local: red        8 GB    4 CPUs x   1 job      8 GB    4 CPUs  (read error detection)
-- Local: oea        4 GB    1 CPU  x   2 jobs     8 GB    2 CPUs  (overlap error adjustment)
-- Local: bat        8 GB    4 CPUs x   1 job      8 GB    4 CPUs  (contig construction with bogart)
-- Local: gfa        8 GB    4 CPUs x   1 job      8 GB    4 CPUs  (GFA alignment and processing)
--
-- In 'ecoli.seqStore', found Nanopore reads:
--   Raw:        20365
--   Corrected:  0
--   Trimmed:    0
--
-- Generating assembly 'ecoli' in '/home/simon/ecoli-oxford'
--
-- Parameters:
--
--  genomeSize        4800000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1200 ( 12.00%)
--    utgOvlErrorRate 0.1200 ( 12.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1200 ( 12.00%)
--    utgErrorRate    0.1200 ( 12.00%)
--    cnsErrorRate    0.2000 ( 20.00%)
--
--
-- BEGIN CORRECTION
--
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Thu Oct 25 13:36:03 2018 with 206.06 GB free disk space (3 processes; 1 concurrently)

    cd correction/1-overlapper
    ./precompute.sh 1 > ./precompute.000001.out 2>&1
    ./precompute.sh 2 > ./precompute.000002.out 2>&1
    ./precompute.sh 3 > ./precompute.000003.out 2>&1

-- Finished on Thu Oct 25 13:36:03 2018 (in the blink of an eye) with 206.06 GB free disk space
----------------------------------------
--
-- Mhap precompute jobs failed, retry.
--   job correction/1-overlapper/blocks/000001.dat FAILED.
--   job correction/1-overlapper/blocks/000002.dat FAILED.
--   job correction/1-overlapper/blocks/000003.dat FAILED.
--
--
-- Running jobs.  Second attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Thu Oct 25 13:36:03 2018 with 206.06 GB free disk space (3 processes; 1 concurrently)

    cd correction/1-overlapper
    ./precompute.sh 1 > ./precompute.000001.out 2>&1
    ./precompute.sh 2 > ./precompute.000002.out 2>&1
    ./precompute.sh 3 > ./precompute.000003.out 2>&1

-- Finished on Thu Oct 25 13:36:03 2018 (in the blink of an eye) with 206.06 GB free disk space
----------------------------------------
--
-- Mhap precompute jobs failed, tried 2 times, giving up.
--   job correction/1-overlapper/blocks/000001.dat FAILED.
--   job correction/1-overlapper/blocks/000002.dat FAILED.
--   job correction/1-overlapper/blocks/000003.dat FAILED.
--

ABORT:
ABORT: Canu 1.8
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

and then 

cat /home/user/ecoli-oxford/correction/1-overlapper/precompute.*.out
Running job 1 based on command line options.
./precompute.sh: 81: ./precompute.sh: /usr/lib/canu/gatekeeperDumpFASTQ: not found
mv: cannot stat './blocks/000001.input.fasta': No such file or directory
Failed to extract fasta.
Running job 2 based on command line options.
./precompute.sh: 81: ./precompute.sh: /usr/lib/canu/gatekeeperDumpFASTQ: not found
mv: cannot stat './blocks/000002.input.fasta': No such file or directory
Failed to extract fasta.
Running job 3 based on command line options.
./precompute.sh: 81: ./precompute.sh: /usr/lib/canu/gatekeeperDumpFASTQ: not found
mv: cannot stat './blocks/000003.input.fasta': No such file or directory
Failed to extract fasta.

Could you please help me?

skoren commented 6 years ago

How did you install Canu? There is no Canu 8, the latest release is 1.8. However, this installation is not valid. It is missing the binaries needed to run Canu (or at least they are not where they should be). It's trying to find them in /usr/lib/canu/ which doesn't seem right and version 1.8 shouldn't have any files named gatekeeper*.

Download the pre-compiled binaries for a release for your system and run using that install instead.

mastermindchr commented 6 years ago

Hi Sergey and thank you for your response. I was just wondering are there precompiled binaries of canu for Ubuntu 18.04 because I can’t actually locate them.

Στις Πέμπτη, 25 Οκτωβρίου 2018, ο χρήστης Sergey Koren < notifications@github.com> έγραψε:

How did you install Canu? There is no Canu 8, the latest release is 1.8. However, this installation is not valid. It is missing the binaries needed to run Canu (or at least they are not where they should be). It's trying to find them in /usr/lib/canu/ which doesn't seem right and version 1.8 shouldn't have any files named gatekeeper*.

Download the pre-compiled binaries for a release for your system and run using that install instead.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/marbl/canu/issues/857#issuecomment-433087028, or mute the thread https://github.com/notifications/unsubscribe-auth/AqA-vOBwapUW4yCRgeR0SpMwCxCwYwm-ks5uodMegaJpZM4TI_fW .

mastermindchr commented 6 years ago

problem solved by installing canu-1.8.Linux-amd64.tar.xz precompiled binaries for linux.

I would also like to ask if you can help set the meryl memory to sth like 7GB. Do you think it is possible?

./canu-1.8/*/bin/canu -p ecoli -d ecoli-oxford genomeSize=4.8m -nanopore-raw oxford.fasta useGrid=false

./canu-1.8/*/bin/canu -p ecoli -d ecoli-oxford genomeSize=4.8m -nanopore-raw oxford.fasta useGrid=false -- Canu 1.8

-- CITATIONS

-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. -- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. -- Genome Res. 2017 May;27(5):722-736. -- http://doi.org/10.1101/gr.215087.116 -- -- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM. -- De novo assembly of haplotype-resolved genomes with trio binning. -- Nat Biotechnol. 2018 -- https//doi.org/10.1038/nbt.4277 -- -- Read and contig alignments during correction, consensus and GFA building use: -- Šošic M, Šikic M. -- Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance. -- Bioinformatics. 2017 May 1;33(9):1394-1395. -- http://doi.org/10.1093/bioinformatics/btw753 -- -- Overlaps are generated using: -- Berlin K, et al. -- Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. -- Nat Biotechnol. 2015 Jun;33(6):623-30. -- http://doi.org/10.1038/nbt.3238 -- -- Myers EW, et al. -- A Whole-Genome Assembly of Drosophila. -- Science. 2000 Mar 24;287(5461):2196-204. -- http://doi.org/10.1126/science.287.5461.2196 -- -- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense: -- Chin CS, et al. -- Phased diploid genome assembly with single-molecule real-time sequencing. -- Nat Methods. 2016 Dec;13(12):1050-1054. -- http://doi.org/10.1038/nmeth.4035 -- -- Contig consensus sequences are generated using an algorithm derived from pbdagcon: -- Chin CS, et al. -- Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. -- Nat Methods. 2013 Jun;10(6):563-9 -- http://doi.org/10.1038/nmeth.2474 -- -- CONFIGURE CANU

-- Detected Java(TM) Runtime Environment '1.8.0_191' (from '/usr/lib/jvm/java-8-oracle/bin/java') with -d64 support. -- Detected gnuplot version '5.2 patchlevel 2 ' (from 'gnuplot') and image format 'png'. -- Detected 4 CPUs and 8 gigabytes of memory. -- Detected Slurm with 'sinfo' binary in /usr/local/bin/sinfo. -- Grid engine disabled per useGrid=false option.

-- (tag)Concurrency -- (tag)Threads -- (tag)Memory -- (tag) total usage algorithm

-- Local: meryl 8 GB 4 CPUs x 1 job 8 GB 4 CPUs (k-mer counting) -- Local: hap 8 GB 4 CPUs x 1 job 8 GB 4 CPUs (read-to-haplotype assignment) -- Local: cormhap 6 GB 4 CPUs x 1 job 6 GB 4 CPUs (overlap detection with mhap) -- Local: obtovl 4 GB 4 CPUs x 1 job 4 GB 4 CPUs (overlap detection) -- Local: utgovl 4 GB 4 CPUs x 1 job 4 GB 4 CPUs (overlap detection) -- Local: ovb 4 GB 1 CPU x 2 jobs 8 GB 2 CPUs (overlap store bucketizer) -- Local: ovs 8 GB 1 CPU x 1 job 8 GB 1 CPU (overlap store sorting) -- Local: red 8 GB 4 CPUs x 1 job 8 GB 4 CPUs (read error detection) -- Local: oea 4 GB 1 CPU x 2 jobs 8 GB 2 CPUs (overlap error adjustment) -- Local: bat 8 GB 4 CPUs x 1 job 8 GB 4 CPUs (contig construction with bogart) -- Local: gfa 8 GB 4 CPUs x 1 job 8 GB 4 CPUs (GFA alignment and processing)

-- In 'ecoli.seqStore', found Nanopore reads: -- Raw: 20365 -- Corrected: 0 -- Trimmed: 0

-- Generating assembly 'ecoli' in '/home/simon/ecoli-oxford'

-- Parameters:

-- genomeSize 4800000

-- Overlap Generation Limits: -- corOvlErrorRate 0.3200 ( 32.00%) -- obtOvlErrorRate 0.1200 ( 12.00%) -- utgOvlErrorRate 0.1200 ( 12.00%)

-- Overlap Processing Limits: -- corErrorRate 0.5000 ( 50.00%) -- obtErrorRate 0.1200 ( 12.00%) -- utgErrorRate 0.1200 ( 12.00%) -- cnsErrorRate 0.2000 ( 20.00%)

-- -- BEGIN CORRECTION

-- segments memory batches


ABORT: ABORT: Canu 1.8 ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped. ABORT: Try restarting. If that doesn't work, ask for help. ABORT: ABORT: failed to parse meryl configure output 'correction/0-mercounts/ecoli.ms16.config.01.out'. ABORT: ABORT: Disk space available: 207.951 GB ABORT: ABORT: Last 50 lines of the relevant log file (correction/0-mercounts/ecoli.ms16.config.01.out): ABORT: ABORT: equal-to N return kmers that occur exactly N times in the input. accepts exactly one input. ABORT: not-equal-to N return kmers that do not occur exactly N times in the input. accepts exactly one input. ABORT: ABORT: increase X add X to the count of each kmer. ABORT: decrease X subtract X from the count of each kmer. ABORT: multiply X multiply the count of each kmer by X. ABORT: divide X divide the count of each kmer by X. ABORT: modulo X set the count of each kmer to the remainder of the count divided by X. ABORT: ABORT: union return kmers that occur in any input, set the count to the number of inputs with this kmer. ABORT: union-min return kmers that occur in any input, set the count to the minimum count ABORT: union-max return kmers that occur in any input, set the count to the maximum count ABORT: union-sum return kmers that occur in any input, set the count to the sum of the counts ABORT: ABORT: intersect return kmers that occur in all inputs, set the count to the count in the first input. ABORT: intersect-min return kmers that occur in all inputs, set the count to the minimum count. ABORT: intersect-max return kmers that occur in all inputs, set the count to the maximum count. ABORT: intersect-sum return kmers that occur in all inputs, set the count to the sum of the counts. ABORT: ABORT: difference return kmers that occur in the first input, but none of the other inputs ABORT: symmetric-difference return kmers that occur in exactly one input ABORT: ABORT: MODIFIERS: ABORT: ABORT: output O write kmers generated by the present command to an output meryl database O ABORT: mandatory for count operations. ABORT: ABORT: EXAMPLES: ABORT: ABORT: Example: Report 22-mers present in at least one of input1.fasta and input2.fasta. ABORT: Kmers from each input are saved in meryl databases 'input1' and 'input2', ABORT: but the kmers in the union are only reported to the screen. ABORT: ABORT: meryl print \ ABORT: union \ ABORT: [count k=22 input1.fasta output input1] \ ABORT: [count k=22 input2.fasta output input2] ABORT: ABORT: Example: Find the highest count of each kmer present in both files, save the kmers to ABORT: database 'maxCount'. ABORT: ABORT: meryl intersect-max input1 input2 output maxCount ABORT: ABORT: Example: Find unique kmers common to both files. Brackets are necessary ABORT: on the first 'equal-to' command to prevent the second 'equal-to' from ABORT: being used as an input to the first 'equal-to'. ABORT: ABORT: meryl intersect [equal-to 1 input1] equal-to 1 input2 ABORT: ABORT: Requested memory 'memory=8' (GB) is more than physical memory 7.68 GB.

mastermindchr commented 6 years ago

./canu-1.8/*/bin/canu -p ecoli -d ecoli-oxford genomeSize=4.8m -nanopore-raw oxford.fasta useGrid=false Memory=6

doesn't seem to work

skoren commented 6 years ago

You should use merylMemory, or better yet maxMemory. maxMemory=7 should limit all steps.

mastermindchr commented 6 years ago

Yes I tried that and unfortunately it doesn't work.

./canu-1.8/*/bin/canu -p ecoli -d ecoli-oxford genomeSize=4.8m -nanopore-raw oxford.fasta useGrid=false merylMemory=7 maxMemory=7 -- Canu 1.8

-- CITATIONS

-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. -- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. -- Genome Res. 2017 May;27(5):722-736. -- http://doi.org/10.1101/gr.215087.116 -- -- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM. -- De novo assembly of haplotype-resolved genomes with trio binning. -- Nat Biotechnol. 2018 -- https//doi.org/10.1038/nbt.4277 -- -- Read and contig alignments during correction, consensus and GFA building use: -- Šošic M, Šikic M. -- Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance. -- Bioinformatics. 2017 May 1;33(9):1394-1395. -- http://doi.org/10.1093/bioinformatics/btw753 -- -- Overlaps are generated using: -- Berlin K, et al. -- Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. -- Nat Biotechnol. 2015 Jun;33(6):623-30. -- http://doi.org/10.1038/nbt.3238 -- -- Myers EW, et al. -- A Whole-Genome Assembly of Drosophila. -- Science. 2000 Mar 24;287(5461):2196-204. -- http://doi.org/10.1126/science.287.5461.2196 -- -- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense: -- Chin CS, et al. -- Phased diploid genome assembly with single-molecule real-time sequencing. -- Nat Methods. 2016 Dec;13(12):1050-1054. -- http://doi.org/10.1038/nmeth.4035 -- -- Contig consensus sequences are generated using an algorithm derived from pbdagcon: -- Chin CS, et al. -- Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. -- Nat Methods. 2013 Jun;10(6):563-9 -- http://doi.org/10.1038/nmeth.2474 -- -- CONFIGURE CANU

-- Detected Java(TM) Runtime Environment '1.8.0_191' (from '/usr/lib/jvm/java-8-oracle/bin/java') with -d64 support. -- Detected gnuplot version '5.2 patchlevel 2 ' (from 'gnuplot') and image format 'png'. -- Detected 4 CPUs and 8 gigabytes of memory. -- Limited to 7 gigabytes from maxMemory option. -- Detected Slurm with 'sinfo' binary in /usr/local/bin/sinfo. -- Grid engine disabled per useGrid=false option.

-- (tag)Concurrency -- (tag)Threads -- (tag)Memory -- (tag) total usage algorithm

-- Local: meryl 7 GB 4 CPUs x 1 job 7 GB 4 CPUs (k-mer counting) -- Local: hap 7 GB 4 CPUs x 1 job 7 GB 4 CPUs (read-to-haplotype assignment) -- Local: cormhap 6 GB 4 CPUs x 1 job 6 GB 4 CPUs (overlap detection with mhap) -- Local: obtovl 4 GB 4 CPUs x 1 job 4 GB 4 CPUs (overlap detection) -- Local: utgovl 4 GB 4 CPUs x 1 job 4 GB 4 CPUs (overlap detection) -- Local: ovb 4 GB 1 CPU x 1 job 4 GB 1 CPU (overlap store bucketizer) -- Local: ovs 7 GB 1 CPU x 1 job 7 GB 1 CPU (overlap store sorting) -- Local: red 7 GB 4 CPUs x 1 job 7 GB 4 CPUs (read error detection) -- Local: oea 4 GB 1 CPU x 1 job 4 GB 1 CPU (overlap error adjustment) -- Local: bat 7 GB 4 CPUs x 1 job 7 GB 4 CPUs (contig construction with bogart) -- Local: gfa 7 GB 4 CPUs x 1 job 7 GB 4 CPUs (GFA alignment and processing)

-- In 'ecoli.seqStore', found Nanopore reads: -- Raw: 20365 -- Corrected: 0 -- Trimmed: 0

-- Generating assembly 'ecoli' in '/home/simon/ecoli-oxford'

-- Parameters:

-- genomeSize 4800000

-- Overlap Generation Limits: -- corOvlErrorRate 0.3200 ( 32.00%) -- obtOvlErrorRate 0.1200 ( 12.00%) -- utgOvlErrorRate 0.1200 ( 12.00%)

-- Overlap Processing Limits: -- corErrorRate 0.5000 ( 50.00%) -- obtErrorRate 0.1200 ( 12.00%) -- utgErrorRate 0.1200 ( 12.00%) -- cnsErrorRate 0.2000 ( 20.00%)

-- -- BEGIN CORRECTION

-- segments memory batches


ABORT: ABORT: Canu 1.8 ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped. ABORT: Try restarting. If that doesn't work, ask for help. ABORT: ABORT: failed to parse meryl configure output 'correction/0-mercounts/ecoli.ms16.config.01.out'. ABORT: ABORT: Disk space available: 207.545 GB ABORT: ABORT: Last 50 lines of the relevant log file (correction/0-mercounts/ecoli.ms16.config.01.out): ABORT: ABORT: equal-to N return kmers that occur exactly N times in the input. accepts exactly one input. ABORT: not-equal-to N return kmers that do not occur exactly N times in the input. accepts exactly one input. ABORT: ABORT: increase X add X to the count of each kmer. ABORT: decrease X subtract X from the count of each kmer. ABORT: multiply X multiply the count of each kmer by X. ABORT: divide X divide the count of each kmer by X. ABORT: modulo X set the count of each kmer to the remainder of the count divided by X. ABORT: ABORT: union return kmers that occur in any input, set the count to the number of inputs with this kmer. ABORT: union-min return kmers that occur in any input, set the count to the minimum count ABORT: union-max return kmers that occur in any input, set the count to the maximum count ABORT: union-sum return kmers that occur in any input, set the count to the sum of the counts ABORT: ABORT: intersect return kmers that occur in all inputs, set the count to the count in the first input. ABORT: intersect-min return kmers that occur in all inputs, set the count to the minimum count. ABORT: intersect-max return kmers that occur in all inputs, set the count to the maximum count. ABORT: intersect-sum return kmers that occur in all inputs, set the count to the sum of the counts. ABORT: ABORT: difference return kmers that occur in the first input, but none of the other inputs ABORT: symmetric-difference return kmers that occur in exactly one input ABORT: ABORT: MODIFIERS: ABORT: ABORT: output O write kmers generated by the present command to an output meryl database O ABORT: mandatory for count operations. ABORT: ABORT: EXAMPLES: ABORT: ABORT: Example: Report 22-mers present in at least one of input1.fasta and input2.fasta. ABORT: Kmers from each input are saved in meryl databases 'input1' and 'input2', ABORT: but the kmers in the union are only reported to the screen. ABORT: ABORT: meryl print \ ABORT: union \ ABORT: [count k=22 input1.fasta output input1] \ ABORT: [count k=22 input2.fasta output input2] ABORT: ABORT: Example: Find the highest count of each kmer present in both files, save the kmers to ABORT: database 'maxCount'. ABORT: ABORT: meryl intersect-max input1 input2 output maxCount ABORT: ABORT: Example: Find unique kmers common to both files. Brackets are necessary ABORT: on the first 'equal-to' command to prevent the second 'equal-to' from ABORT: being used as an input to the first 'equal-to'. ABORT: ABORT: meryl intersect [equal-to 1 input1] equal-to 1 input2 ABORT: ABORT: Requested memory 'memory=8' (GB) is more than physical memory 7.68 GB. ABORT:

brianwalenz commented 6 years ago

The scripts canu writes to run these jobs are not recreated when parameters change - the scripts in correction/0-mercounts all have the old memory size (8gb). Just entirely remove that 0-mercounts directory. Better, remove the whole output directory to start fresh.

Running with 8gb physical memory is, unfortunately, entirely untested.