Canu won't run - Githubissues

peflanag commented 6 years ago

Hi All,

I'm new to running Canu and quite a novice at using terminal commands. I managed to get Canu installed on an iMac that I use for Nanopore sequencing. I am looking to run an assembly with Canu but it just doesn't seem to work! Any advice on what I am doing wrong? I have attached the copy of the script from terminal.

Am I right in assuming:

-p is the name i want for the output file? -d the location to but the new file?

I have trimmed and demultiplexed with Porechop so I am also assuming the last commend is nanopore-corrected and then the directory to the file that I want to align?

Cheers

MinIONs-iMac:~ minion$ canu -assemble \
> -p 300OR1 -d /Users/minion/Desktop/AoC\ MinION\ Seq/Porechop\ Files/Canu\ Output \
> genomeSize=2.8m \
> -nanopore-corrected /Users/minion/Desktop/AoC\ MinION\ Seq/Porechop\ Files/BC01.fastq 
--   Reason: image not found
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
-- 
-- Read and contig alignments during correction, consensus and GFA building use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
--   Li H.
--   Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.
--   Bioinformatics. 2016 Jul 15;32(14):2103-10.
--   http://doi.org/10.1093/bioinformatics/btw152
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '9.0.4' (from 'java').
-- Detected gnuplot version '5.2 patchlevel 2' (from 'gnuplot') and image format 'png'.
-- Detected 8 CPUs and 32 gigabytes of memory.
-- No grid engine detected, grid disabled.
--
--                            (tag)Concurrency
--                     (tag)Threads          |
--            (tag)Memory         |          |
--        (tag)         |         |          |     total usage     algorithm
--        -------  ------  --------   --------  -----------------  -----------------------------
-- Local: meryl      8 GB    4 CPUs x   1 job      8 GB    4 CPUs  (k-mer counting)
-- Local: cormhap    6 GB    8 CPUs x   1 job      6 GB    8 CPUs  (overlap detection with mhap)
-- Local: obtovl     4 GB    8 CPUs x   1 job      4 GB    8 CPUs  (overlap detection)
-- Local: utgovl     4 GB    8 CPUs x   1 job      4 GB    8 CPUs  (overlap detection)
-- Local: ovb        4 GB    1 CPU  x   8 jobs    32 GB    8 CPUs  (overlap store bucketizer)
-- Local: ovs        8 GB    1 CPU  x   4 jobs    32 GB    4 CPUs  (overlap store sorting)
-- Local: red        4 GB    4 CPUs x   2 jobs     8 GB    8 CPUs  (read error detection)
-- Local: oea        4 GB    1 CPU  x   8 jobs    32 GB    8 CPUs  (overlap error adjustment)
-- Local: bat       16 GB    4 CPUs x   1 job     16 GB    4 CPUs  (contig construction)
-- Local: gfa        8 GB    4 CPUs x   1 job      8 GB    4 CPUs  (GFA alignment and processing)
--
-- Found Nanopore corrected reads in the input files.
--
-- Generating assembly '300OR1' in '/Users/minion/Desktop/AoC MinION Seq/Porechop Files/Canu Output'
--
-- Parameters:
--
--  genomeSize        2800000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1440 ( 14.40%)
--    utgOvlErrorRate 0.1440 ( 14.40%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1440 ( 14.40%)
--    utgErrorRate    0.1440 ( 14.40%)
--    cnsErrorRate    0.1920 ( 19.20%)
--
--
-- BEGIN ASSEMBLY
--
----------------------------------------
-- Starting command on Fri Apr  6 11:37:32 2018 with 648.128 GB free disk space

    cd .
    /Users/minion/Documents/Apps/canu-1.7/Darwin-amd64/bin/gatekeeperCreate \
      -minlength 1000 \
      -o ./300OR1.gkpStore.BUILDING \
      ./300OR1.gkpStore.gkp \
    > ./300OR1.gkpStore.BUILDING.err 2>&1
sh: line 4: 21381 Abort trap: 6           /Users/minion/Documents/Apps/canu-1.7/Darwin-amd64/bin/gatekeeperCreate -minlength 1000 -o ./300OR1.gkpStore.BUILDING ./300OR1.gkpStore.gkp > ./300OR1.gkpStore.BUILDING.err 2>&1

-- Finished on Fri Apr  6 11:37:32 2018 (lickety-split) with 648.128 GB free disk space
----------------------------------------

ERROR:
ERROR:  Failed with exit code 134.  (rc=34304)
ERROR:

ABORT:
ABORT:   Reason: image not found
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
ABORT:   gatekeeper failed.
ABORT:
ABORT: Disk space available:  648.128 GB
ABORT:
ABORT: Last 50 lines of the relevant log file (./300OR1.gkpStore.BUILDING.err):
ABORT:
ABORT:
MinIONs-iMac:~ minion$

skoren commented 6 years ago

You should use -nanopore-raw and don't specify -assemble. Porechop doesn't correct the data just trims it. Maybe also add -fast to save runtime.

The error is an incompatibility between the libraries on your computer and those Canu was built with. You should download the source and re-compile from source instead (see issue #821). The default OS X compiler doesn't support parallelism that Canu uses so you also want to set threads to 1: cnsThreads=1 corThreads=1 cormhapThreads=1 obtmhapThreads=1 oeaThreads=1 ovbThreads=1 ovsThreads=1 redThreads=1 utgmhapThreads=1

I'll look into how we can package better for OS X to avoid this.

peflanag commented 6 years ago

Im sorry to ask what I'm sure is a stupid question but what is the -fast command and where do I add to my script?

My script is:

canu \ -p [Name of output file] -d [directory to save too] genomeSize=2.8m \ -nanopore-raw [directory to porechop demultiplexed and trimmed file]

An easier install for macOS would be great! Beforehand I was spending weeks trying to figure out why it wasn't working and had to go to a walk in workshop in college to have them install it this morning.

skoren commented 6 years ago

You would add fast to the canu command so make it:

canu 
-fast -p [Name of output file] -d [directory to save too]
genomeSize=2.8m 
-nanopore-raw [directory to porechop demultiplexed and trimmed file]

It's our experimental option that saves significant compute but may produce a less contiguous assembly. It works pretty well on bacterial genomes though.

Can you try downloading the following tarball: https://gembox.cbcb.umd.edu/shared/canu-1.7.Darwin-amd64.tar.bz2. You can extract it using tar xvjf canu-1.7.Darwin-amd64.tar.bz2. Canu you try seeing if you can run it instead, you don't need to run the full pipeline, just run canu-1.7/Darwin-amd64/bin/gatekeeperCreate and post the output.

peflanag commented 6 years ago

I downloaded and typed that command into terminal and got the following error

Last login: Fri Apr 6 14:14:50 on ttys001 MinIONs-iMac:~ minion$ xvjf canu-1.7.Darwin-amd64.tar.bz2 -bash: xvjf: command not found MinIONs-iMac:~ minion$

skoren commented 6 years ago

It should be tar xvjf canu-1.7.Darwin-amd64.tar.bz2

peflanag commented 6 years ago

MinIONs-iMac:~ minion$ tar xvjf canu-1.7.Darwin-amd64.tar.bz2 tar: Error opening archive: Failed to open 'canu-1.7.Darwin-amd64.tar.bz2' MinIONs-iMac:~ minion$

If i double click on it in my downloads folder it unzipps

skoren commented 6 years ago

OK, what happens if you try to run the canu-1.7/Darwin-amd64/bin/gatekeeperCreate command?

peflanag commented 6 years ago

MinIONs-iMac:~ minion$ canu-1.7/Darwin-amd64/bin/gatekeeperCreate -bash: canu-1.7/Darwin-amd64/bin/gatekeeperCreate: No such file or directory MinIONs-iMac:~ minion$

But Im guessing should I move the unzipped file to the directory where the canu-1.7 file is that one of the lads in the high computer centre originally installed it for me? He installed it and placed it in a directory for me so when i launch terminal I can just type canu for it to work like I do with the BWA commands.

skoren commented 6 years ago

Either that or you can give the full path to wherever you downloaded/unzipped the tar.

peflanag commented 6 years ago

I moved the Darwin download you sent me to the location the guy installed it for me this morning and ran the code again but it failed

MinIONs-iMac:~ minion$ canu-1.7/Darwin-amd64/bin/gatekeeperCreate -bash: canu-1.7/Darwin-amd64/bin/gatekeeperCreate: No such file or directory MinIONs-iMac:~ minion$

skoren commented 6 years ago

If you moved it to the same folder and replaced the previous one it should be /Users/minion/Documents/Apps/canu-1.7/Darwin-amd64/bin/gatekeeperCreate

peflanag commented 6 years ago

Im so sorry, I'm not terminal savvy! I've redone that and got this:

MinIONs-iMac:~ minion$ /Users/minion/Documents/Apps/canu-1.7/Darwin-amd64/bin/gatekeeperCreate dyld: lazy symbol binding failed: Symbol not found: ___emutls_get_address Referenced from: /Users/minion/Documents/Apps/canu-1.7/Darwin-amd64/bin/..//lib/libgomp.1.dylib Expected in: /usr/lib/libSystem.B.dylib

dyld: Symbol not found: ___emutls_get_address Referenced from: /Users/minion/Documents/Apps/canu-1.7/Darwin-amd64/bin/..//lib/libgomp.1.dylib Expected in: /usr/lib/libSystem.B.dylib

Abort trap: 6 MinIONs-iMac:~ minion$

skoren commented 6 years ago

Ah OK, guess it won't be that easy, you'll have to build from source. What is your version of OS X?

peflanag commented 6 years ago

High Sierra 10.13.3

peflanag commented 6 years ago

although Ive been prompted with an update to 10.13.4 just there now!

skoren commented 6 years ago

One last thing to try before defaulting to building from source: xcode-select --install

and confirm you have the file /usr/lib/libSystem.B.dylib. If it still doesn't work after that, download the source code rather than the OS X binary and follow instructions on the release notes to compile.

peflanag commented 6 years ago

MinIONs-iMac:~ minion$ xcode-select --install xcode-select: error: command line tools are already installed, use "Software Update" to install updates MinIONs-iMac:~ minion$

peflanag commented 6 years ago

Xcode isn't installed on the mac though so I am downloading via the app store

peflanag commented 6 years ago

Ok I've now got Xcode installed via the Mac App Store

peflanag commented 6 years ago

I dont have the above file

MinIONs-iMac:~ minion$ -a Finder /usr/lib/libSystem.B.dylib -bash: -a: command not found MinIONs-iMac:~ minion$

skoren commented 6 years ago

OK, I think I found the cause of the issue. Try downloading the same link as before again (https://gembox.cbcb.umd.edu/shared/canu-1.7.Darwin-amd64.tar.bz2), download and extract, move to same location, and try running /Users/minion/Documents/Apps/canu-1.7/Darwin-amd64/bin/gatekeeperCreate and see what happens now.

skoren commented 6 years ago

I updated the latest release binaries and they should include the required libraries to run as long as you have OS X 10.12 or newer. I've confirmed it works on a machine without OpenMP/GCC installed as well.

peflanag commented 6 years ago

Sorry I had left the office when you replied with the above. I will try this this afternoon! Cheers

peflanag commented 6 years ago

Hi Skoren, I have downloaded that file, replaced the old and ran the above script. This is the result I got. does this look right to you?

Last login: Fri Apr 6 16:03:56 on ttys000 MinIONs-iMac:~ minion$ /Users/minion/Documents/Apps/canu-1.7/Darwin-amd64/bin/gatekeeperCreate usage: /Users/minion/Documents/Apps/canu-1.7/Darwin-amd64/bin/gatekeeperCreate [-minlength L] -o gkpStore input.gkp -o gkpStore load raw reads into new gkpStore -minlength L discard reads shorter than L

ERROR: no gkpStore (-o) supplied. ERROR: no input files supplied. MinIONs-iMac:~ minion$

skoren commented 6 years ago

OK so it works on your system now, you should be able to run the assembly you were running before.

peflanag commented 6 years ago

Cheers. I just tried it there but there seems to be an issue with the genomeSize command now

peflanag commented 6 years ago

MinIONs-iMac:~ minion$ canu \
-p 300OR1 -d /Users/minion/Desktop/AoC\ MinION\ Seq/Canu\ Output \
genomeSize=2.8m \
-nanopore-raw /Users/minion/Desktop/AoC\ MinION\ Seq/Porechop\ Files/BC01.fastq

usage:   canu [-version] [-citation] \
              [-correct | -trim | -assemble | -trim-assemble] \
              [-s <assembly-specifications-file>] \
               -p <assembly-prefix> \
               -d <assembly-directory> \
               genomeSize=<number>[g|m|k] \
              [other-options] \
              [-pacbio-raw |
               -pacbio-corrected |
               -nanopore-raw |
               -nanopore-corrected] file1 file2 ...

example: canu -d run1 -p godzilla genomeSize=1g -nanopore-raw reads/*.fasta.gz 

  To restrict canu to only a specific stage, use:
    -correct       - generate corrected reads
    -trim          - generate trimmed reads
    -assemble      - generate an assembly
    -trim-assemble - generate trimmed reads and then assemble them

  The assembly is computed in the -d <assembly-directory>, with output files named
  using the -p <assembly-prefix>.  This directory is created if needed.  It is not
  possible to run multiple assemblies in the same directory.

  The genome size should be your best guess of the haploid genome size of what is being
  assembled.  It is used primarily to estimate coverage in reads, NOT as the desired
  assembly size.  Fractional values are allowed: '4.7m' equals '4700k' equals '4700000'

  Some common options:
    useGrid=string
      - Run under grid control (true), locally (false), or set up for grid control
        but don't submit any jobs (remote)
    rawErrorRate=fraction-error
      - The allowed difference in an overlap between two raw uncorrected reads.  For lower
        quality reads, use a higher number.  The defaults are 0.300 for PacBio reads and
        0.500 for Nanopore reads.
    correctedErrorRate=fraction-error
      - The allowed difference in an overlap between two corrected reads.  Assemblies of
        low coverage or data with biological differences will benefit from a slight increase
        in this.  Defaults are 0.045 for PacBio reads and 0.144 for Nanopore reads.
    gridOptions=string
      - Pass string to the command used to submit jobs to the grid.  Can be used to set
        maximum run time limits.  Should NOT be used to set memory limits; Canu will do
        that for you.
    minReadLength=number
      - Ignore reads shorter than 'number' bases long.  Default: 1000.
    minOverlapLength=number
      - Ignore read-to-read overlaps shorter than 'number' bases long.  Default: 500.
  A full list of options can be printed with '-options'.  All options can be supplied in
  an optional sepc file with the -s option.

  Reads can be either FASTA or FASTQ format, uncompressed, or compressed with gz, bz2 or xz.
  Reads are specified by the technology they were generated with, and any processing performed:
    -pacbio-raw         <files>      Reads are straight off the machine.
    -pacbio-corrected   <files>      Reads have been corrected.
    -nanopore-raw       <files>
    -nanopore-corrected <files>

Complete documentation at http://canu.readthedocs.org/en/latest/

ERROR:  File 'genomeSize=2.8m' supplied on command line, don't know what to do with it.

skoren commented 6 years ago

Don't use spaces in the filenames, I'm not sure those will get properly escaped.

peflanag commented 6 years ago

Unfortunately its the same error

MinIONs-iMac:~ minion$ canu \
-p 300OR1 -d /Users/minion/Desktop \
 genomeSize=2.8m \
-nanopore-raw /Users/minion/Desktop/BC01.fastq

usage:   canu [-version] [-citation] \
              [-correct | -trim | -assemble | -trim-assemble] \
              [-s <assembly-specifications-file>] \
               -p <assembly-prefix> \
               -d <assembly-directory> \
               genomeSize=<number>[g|m|k] \
              [other-options] \
              [-pacbio-raw |
               -pacbio-corrected |
               -nanopore-raw |
               -nanopore-corrected] file1 file2 ...

example: canu -d run1 -p godzilla genomeSize=1g -nanopore-raw reads/*.fasta.gz 

  To restrict canu to only a specific stage, use:
    -correct       - generate corrected reads
    -trim          - generate trimmed reads
    -assemble      - generate an assembly
    -trim-assemble - generate trimmed reads and then assemble them

  The assembly is computed in the -d <assembly-directory>, with output files named
  using the -p <assembly-prefix>.  This directory is created if needed.  It is not
  possible to run multiple assemblies in the same directory.

  The genome size should be your best guess of the haploid genome size of what is being
  assembled.  It is used primarily to estimate coverage in reads, NOT as the desired
  assembly size.  Fractional values are allowed: '4.7m' equals '4700k' equals '4700000'

  Some common options:
    useGrid=string
      - Run under grid control (true), locally (false), or set up for grid control
        but don't submit any jobs (remote)
    rawErrorRate=fraction-error
      - The allowed difference in an overlap between two raw uncorrected reads.  For lower
        quality reads, use a higher number.  The defaults are 0.300 for PacBio reads and
        0.500 for Nanopore reads.
    correctedErrorRate=fraction-error
      - The allowed difference in an overlap between two corrected reads.  Assemblies of
        low coverage or data with biological differences will benefit from a slight increase
        in this.  Defaults are 0.045 for PacBio reads and 0.144 for Nanopore reads.
    gridOptions=string
      - Pass string to the command used to submit jobs to the grid.  Can be used to set
        maximum run time limits.  Should NOT be used to set memory limits; Canu will do
        that for you.
    minReadLength=number
      - Ignore reads shorter than 'number' bases long.  Default: 1000.
    minOverlapLength=number
      - Ignore read-to-read overlaps shorter than 'number' bases long.  Default: 500.
  A full list of options can be printed with '-options'.  All options can be supplied in
  an optional sepc file with the -s option.

  Reads can be either FASTA or FASTQ format, uncompressed, or compressed with gz, bz2 or xz.
  Reads are specified by the technology they were generated with, and any processing performed:
    -pacbio-raw         <files>      Reads are straight off the machine.
    -pacbio-corrected   <files>      Reads have been corrected.
    -nanopore-raw       <files>
    -nanopore-corrected <files>

Complete documentation at http://canu.readthedocs.org/en/latest/

ERROR:  File 'genomeSize=2.8m' supplied on command line, don't know what to do with it.

brianwalenz commented 6 years ago

Is there a file 'genomeSize=2.8m' in the directory where you ran Canu? That's the only way that message can be printed.

peflanag commented 6 years ago

Theres not. But I will make a new file directory and try now.

peflanag commented 6 years ago

Same error. Im just wondering, do I have to but a " - " before genomeSize? The rest of the script has it except that line

skoren commented 6 years ago

Nope, you don't need the -. I'd guess there is a bad character somewhere in the command line, are you copying/pasting it? Try re-typing it and using local paths is fine:

canu -p 300OR1 -d asm genomeSize=2.8m -nanopore-raw /Users/minion/Desktop/BC01.fastq

peflanag commented 6 years ago

I was typing with the local paths. But I will give it another shot!

peflanag commented 6 years ago

Same again unfortunately. Just curious as to what the asm referes to in your script after -d above

MinIONs-iMac:~ minion$ canu -p 300OR1 -d /Users/minion/Desktop genomeSize=2.8m -nanopore-raw /Users/minion/Desktop/BC01.fastq

skoren commented 6 years ago

I've confirmed the syntax is correct and I can run your command locally. I don't think this is a Canu error but a command-line issue. In fact, nothing in the canu script has changed from the first version you downloaded and ran initially so you should be able to re-run that command unless something has changed on your terminal. If it still doesn't work, I would suggest making sure you can run the tutorial assemblies on the quick start page first.

peflanag commented 6 years ago

Is that just this command?

MinIONs-iMac:~ minion$ canu

usage:   canu [-version] [-citation] \
              [-correct | -trim | -assemble | -trim-assemble] \
              [-s <assembly-specifications-file>] \
               -p <assembly-prefix> \
               -d <assembly-directory> \
               genomeSize=<number>[g|m|k] \
              [other-options] \
              [-pacbio-raw |
               -pacbio-corrected |
               -nanopore-raw |
               -nanopore-corrected] file1 file2 ...

example: canu -d run1 -p godzilla genomeSize=1g -nanopore-raw reads/*.fasta.gz 

  To restrict canu to only a specific stage, use:
    -correct       - generate corrected reads
    -trim          - generate trimmed reads
    -assemble      - generate an assembly
    -trim-assemble - generate trimmed reads and then assemble them

  The assembly is computed in the -d <assembly-directory>, with output files named
  using the -p <assembly-prefix>.  This directory is created if needed.  It is not
  possible to run multiple assemblies in the same directory.

  The genome size should be your best guess of the haploid genome size of what is being
  assembled.  It is used primarily to estimate coverage in reads, NOT as the desired
  assembly size.  Fractional values are allowed: '4.7m' equals '4700k' equals '4700000'

  Some common options:
    useGrid=string
      - Run under grid control (true), locally (false), or set up for grid control
        but don't submit any jobs (remote)
    rawErrorRate=fraction-error
      - The allowed difference in an overlap between two raw uncorrected reads.  For lower
        quality reads, use a higher number.  The defaults are 0.300 for PacBio reads and
        0.500 for Nanopore reads.
    correctedErrorRate=fraction-error
      - The allowed difference in an overlap between two corrected reads.  Assemblies of
        low coverage or data with biological differences will benefit from a slight increase
        in this.  Defaults are 0.045 for PacBio reads and 0.144 for Nanopore reads.
    gridOptions=string
      - Pass string to the command used to submit jobs to the grid.  Can be used to set
        maximum run time limits.  Should NOT be used to set memory limits; Canu will do
        that for you.
    minReadLength=number
      - Ignore reads shorter than 'number' bases long.  Default: 1000.
    minOverlapLength=number
      - Ignore read-to-read overlaps shorter than 'number' bases long.  Default: 500.
  A full list of options can be printed with '-options'.  All options can be supplied in
  an optional sepc file with the -s option.

  Reads can be either FASTA or FASTQ format, uncompressed, or compressed with gz, bz2 or xz.
  Reads are specified by the technology they were generated with, and any processing performed:
    -pacbio-raw         <files>      Reads are straight off the machine.
    -pacbio-corrected   <files>      Reads have been corrected.
    -nanopore-raw       <files>
    -nanopore-corrected <files>

Complete documentation at http://canu.readthedocs.org/en/latest/

MinIONs-iMac:~ minion$

peflanag commented 6 years ago

I downloaded the oxford data from the link on the quick start and ran the command. It seems to be working fine. It hasn't finished but for me to copy the terminal window in here but there is no errors

skoren commented 6 years ago

I'm not sure what you're asking, that is the help of the command when you don't specify any options. You are getting it reported because the options specified are invalid/not parsed correctly.

If the quick start is running that means Canu is working correctly. You can stop that run and remove the ecoli-oxford folder. Try the same command you used for quickstart just update the genome size and read location and see if that runs.

peflanag commented 6 years ago

Sorry I thought that was what you wanted me to run but I ran the ones from the quick start window and it seems to be working.

Last login: Mon Apr  9 16:08:43 on ttys000
MinIONs-iMac:~ minion$ curl -L -o oxford.fasta http://nanopore.s3.climb.ac.uk/MAP006-PCR-1_2D_pass.fasta
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  138M  100  138M    0     0  1683k      0  0:01:24  0:01:24 --:--:-- 2627k
MinIONs-iMac:~ minion$ canu -p ecoli -d ecoli-oxford genomeSize=4.8m -nanopore-raw oxford.fasta
-- Canu snapshot v1.7 +0 changes (r8692 c9ef9219a265e0bbe3a311cca7d28aa02b7517d3)
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
-- 
-- Read and contig alignments during correction, consensus and GFA building use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
--   Li H.
--   Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.
--   Bioinformatics. 2016 Jul 15;32(14):2103-10.
--   http://doi.org/10.1093/bioinformatics/btw152
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '9.0.4' (from 'java').
-- Detected gnuplot version '5.2 patchlevel 2' (from 'gnuplot') and image format 'png'.
-- Detected 8 CPUs and 32 gigabytes of memory.
-- No grid engine detected, grid disabled.
--
--                            (tag)Concurrency
--                     (tag)Threads          |
--            (tag)Memory         |          |
--        (tag)         |         |          |     total usage     algorithm
--        -------  ------  --------   --------  -----------------  -----------------------------
-- Local: meryl      8 GB    4 CPUs x   1 job      8 GB    4 CPUs  (k-mer counting)
-- Local: cormhap    6 GB    8 CPUs x   1 job      6 GB    8 CPUs  (overlap detection with mhap)
-- Local: obtovl     4 GB    8 CPUs x   1 job      4 GB    8 CPUs  (overlap detection)
-- Local: utgovl     4 GB    8 CPUs x   1 job      4 GB    8 CPUs  (overlap detection)
-- Local: ovb        4 GB    1 CPU  x   8 jobs    32 GB    8 CPUs  (overlap store bucketizer)
-- Local: ovs        8 GB    1 CPU  x   4 jobs    32 GB    4 CPUs  (overlap store sorting)
-- Local: red        4 GB    4 CPUs x   2 jobs     8 GB    8 CPUs  (read error detection)
-- Local: oea        4 GB    1 CPU  x   8 jobs    32 GB    8 CPUs  (overlap error adjustment)
-- Local: bat       16 GB    4 CPUs x   1 job     16 GB    4 CPUs  (contig construction)
-- Local: gfa        8 GB    4 CPUs x   1 job      8 GB    4 CPUs  (GFA alignment and processing)
--
-- Found Nanopore uncorrected reads in the input files.
--
-- Generating assembly 'ecoli' in '/Users/minion/ecoli-oxford'
--
-- Parameters:
--
--  genomeSize        4800000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1440 ( 14.40%)
--    utgOvlErrorRate 0.1440 ( 14.40%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1440 ( 14.40%)
--    utgErrorRate    0.1440 ( 14.40%)
--    cnsErrorRate    0.1920 ( 19.20%)
--
--
-- BEGIN CORRECTION
--
----------------------------------------
-- Starting command on Mon Apr  9 16:11:25 2018 with 646.795 GB free disk space

    cd .
    /Users/minion/Documents/Apps/canu-1.7/Darwin-amd64/bin/gatekeeperCreate \
      -minlength 1000 \
      -o ./ecoli.gkpStore.BUILDING \
      ./ecoli.gkpStore.gkp \
    > ./ecoli.gkpStore.BUILDING.err 2>&1

-- Finished on Mon Apr  9 16:11:27 2018 (2 seconds) with 646.75 GB free disk space
----------------------------------------
--
-- In gatekeeper store './ecoli.gkpStore':
--   Found 20365 reads.
--   Found 140042151 bases (29.17 times coverage).
--
--   Read length histogram (one '*' equals 41.48 reads):
--        0    999      0 
--     1000   1999    706 *****************
--     2000   2999   1682 ****************************************
--     3000   3999   1624 ***************************************
--     4000   4999   1543 *************************************
--     5000   5999   1905 *********************************************
--     6000   6999   2691 ****************************************************************
--     7000   7999   2904 **********************************************************************
--     8000   8999   2609 **************************************************************
--     9000   9999   1946 **********************************************
--    10000  10999   1280 ******************************
--    11000  11999    733 *****************
--    12000  12999    397 *********
--    13000  13999    181 ****
--    14000  14999    109 **
--    15000  15999     38 
--    16000  16999      9 
--    17000  17999      4 
--    18000  18999      2 
--    19000  19999      0 
--    20000  20999      0 
--    21000  21999      0 
--    22000  22999      1 
--    23000  23999      0 
--    24000  24999      0 
--    25000  25999      1 
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'meryl' concurrent execution on Mon Apr  9 16:11:27 2018 with 646.75 GB free disk space (1 processes; 1 concurrently)

    cd correction/0-mercounts
    ./meryl.sh 1 > ./meryl.000001.out 2>&1

-- Finished on Mon Apr  9 16:11:41 2018 (14 seconds) with 646.419 GB free disk space
----------------------------------------
-- Meryl finished successfully.
--
--  16-mers                                                                                           Fraction
--    Occurrences   NumMers                                                                         Unique Total
--       1-     1  70555151 *******************************************************************--> 0.8655 0.5049
--       2-     2   4917952 ********************************************************************** 0.9259 0.5753
--       3-     4   1399886 *******************                                                    0.9380 0.5965
--       5-     7    905449 ************                                                           0.9466 0.6188
--       8-    11   1612461 **********************                                                 0.9586 0.6683
--      12-    16   1552473 **********************                                                 0.9789 0.7922
--      17-    22    494738 *******                                                                0.9949 0.9293
--      23-    29     52236                                                                        0.9993 0.9785
--      30-    37      9061                                                                        0.9997 0.9851
--      38-    46      4073                                                                        0.9998 0.9870
--      47-    56      2676                                                                        0.9998 0.9881
--      57-    67      1989                                                                        0.9999 0.9891
--      68-    79      2326                                                                        0.9999 0.9900
--      80-    92      2011                                                                        0.9999 0.9912
--      93-   106      1225                                                                        1.0000 0.9924
--     107-   121       636                                                                        1.0000 0.9933
--     122-   137       517                                                                        1.0000 0.9938
--     138-   154       349                                                                        1.0000 0.9942
--     155-   172       166                                                                        1.0000 0.9946
--     173-   191       107                                                                        1.0000 0.9948
--     192-   211        80                                                                        1.0000 0.9949
--     212-   232        60                                                                        1.0000 0.9950
--     233-   254        53                                                                        1.0000 0.9951
--     255-   277        37                                                                        1.0000 0.9952
--     278-   301        33                                                                        1.0000 0.9953
--     302-   326        29                                                                        1.0000 0.9954
--     327-   352        21                                                                        1.0000 0.9954
--     353-   379        27                                                                        1.0000 0.9955
--     380-   407        17                                                                        1.0000 0.9955
--     408-   436        19                                                                        1.0000 0.9956
--     437-   466        14                                                                        1.0000 0.9956
--     467-   497        13                                                                        1.0000 0.9957
--     498-   529        17                                                                        1.0000 0.9957
--     530-   562        20                                                                        1.0000 0.9958
--     563-   596        10                                                                        1.0000 0.9959
--     597-   631        16                                                                        1.0000 0.9959
--     632-   667        10                                                                        1.0000 0.9960
--     668-   704         9                                                                        1.0000 0.9960
--     705-   742         8                                                                        1.0000 0.9961
--     743-   781        11                                                                        1.0000 0.9961
--     782-   821         6                                                                        1.0000 0.9962
--
--       13740 (max occurrences)
--    69181525 (total mers, non-unique)
--    10960962 (distinct mers, non-unique)
--    70555151 (unique mers)
-- For mhap overlapping, set repeat k-mer threshold to 1397.
--
-- Found 139736676 16-mers; 81516113 distinct and 70555151 unique.  Largest count 13740.
--
-- OVERLAPPER (mhap) (correction)
--
-- Set corMhapSensitivity=high based on read coverage of 29.
--
-- PARAMETERS: hashes=768, minMatches=2, threshold=0.78
--
-- Given 6 GB, can fit 9000 reads per block.
-- For 4 blocks, set stride to 2 blocks.
-- Logging partitioning to 'correction/1-overlapper/partitioning.log'.
-- Configured 3 mhap precompute jobs.
-- Configured 3 mhap overlap jobs.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'cormhap' concurrent execution on Mon Apr  9 16:11:42 2018 with 646.75 GB free disk space (3 processes; 1 concurrently)

    cd correction/1-overlapper
    ./precompute.sh 1 > ./precompute.000001.out 2>&1

peflanag commented 6 years ago

I got it working!! Thank you so much for all your help! There was a hidden genomeSize file that I scoured the computer for and found! Once I deleted that and moved the folder to the home directory it worked! Again, thanks so much!

skoren commented 6 years ago

No problem, since you're running on a small-ish machine you may want to add -fast to the command too, it will save some compute and should be comparable assembly for bacterial genomes.

peflanag commented 6 years ago

I'll do that going forward because we are using Canu to assemble bacterial genomes. Would it work with yeast genomes too? Candida albicans?

skoren commented 6 years ago

It will work with any genome, it just might give you a less contiguous assembly.

peflanag commented 6 years ago

Cool. Cheers for all your help! I really appreciate it!

peflanag commented 6 years ago

Hi Skoren, sorry to bother you again. I am trying to run canu on a students nanopore reads that have been trimmed and demultiplexed with porechop. however I am getting a new error with canu which I have pasted below. I must point out that the reads are not as long as usual.

Gatekeeper detected potential problems in your input reads.

Please review the logging in files: /Users/minion/Sarah110418/CanuOutput/M160427.gkpStore.BUILDING.err /Users/minion/Sarah110418/CanuOutput/M160427.gkpStore.BUILDING/errorLog

If you wish to proceed, rename the store with the following command and restart canu.

mv /Users/minion/Sarah110418/CanuOutput/M160427.gkpStore.BUILDING \ /Users/minion/Sarah110418/CanuOutput/M160427.gkpStore.ACCEPTED

If i still want to run it I am uncertain what it means by rename the "store" what is the store? do I write as follows:

canu -fast -p [Name] -d [Output directory] genomeSize=3.6m -nanopore-raw [Input File] mv /Users/minion/Sarah110418/CanuOutput/M160427.gkpStore.BUILDING \ /Users/minion/Sarah110418/CanuOutput/M160427.gkpStore.ACCEPTED

skoren commented 6 years ago

Essentially, it's warning you too many of your reads were filtered out, most likely due to length. You can see the full log in /Users/minion/Sarah110418/CanuOutput/M160427.gkpStore.BUILDING/errorLog.

If you're OK with the filtering you can follow the Canu instructions, you would just run the command it gives you as is mv /Users/minion/Sarah110418/CanuOutput/M160427.gkpStore.BUILDING /Users/minion/Sarah110418/CanuOutput/M160427.gkpStore.ACCEPTED and re-launch the original Canu command as before.

If you always want it to ignore that reads were filtered and assemble anyway, add stopOnReadQuality=false to your Canu command line (or put it in a file name canu.defaults in /Users/minion/Documents/Apps/canu-1.7/Darwin-amd64/bin/.

peflanag commented 6 years ago

Ok cool. I will try that again. Thanks so much.

peflanag commented 6 years ago

Hi Skoren, sorry to bother you again. I have only got around to looking at this now. So after the fail, I still want to run and typed the command as given:

mv /Users/minion/Desktop/Sarah 11_04_18/Canu Output/BC01.gkpStore.BUILDING \ /Users/minion/Desktop/Sarah 11_04_18/Canu Output/BC01.gkpStore.ACCEPTED

However I get this:

MinIONs-iMac:~ minion$ mv /Users/minion/Desktop/Sarah 11_04_18/Canu Output/BC01.gkpStore.BUILDING \

 /Users/minion/Desktop/Sarah 11_04_18/Canu Output/BC01.gkpStore.ACCEPTED
usage: mv [-f | -i | -n] [-v] source target mv [-f | -i | -n] [-v] source ... directory MinIONs-iMac:~ minion$

What does the -f -i -n and -v stand for?

Cheers!

skoren commented 6 years ago

That's just the mv command usage. You can get information on most commands using man. For example man mv.

The problem is you truncated your command when you copied it, the \ means it continues onto the next line but that part didn't get copied. You need the full command: mv /Users/minion/Sarah110418/CanuOutput/M160427.gkpStore.BUILDING /Users/minion/Sarah110418/CanuOutput/M160427.gkpStore.ACCEPTED.

or just erase the canu output folder (Canu Output) and re-start with stopOnReadQuality=false.

marbl / canu

Canu won't run #859