marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

Canu run ends to quickly #974

Closed ljw90607 closed 6 years ago

ljw90607 commented 6 years ago

Hello, I am currently trying to run canu with the ecoli test data. Unfortunately, the run doesn't seem not going through properly. Even though I wrote the exact sampe example code, I found from the tutorial page (http://canu.readthedocs.io/en/latest/tutorial.html), I still end up with no output result. Since I am pretty new to the tool handling, I'd like for anyone to help me with the issue.

The code I used is shown below:

canu -p ecoli -d canu-ecoli genomeSize=4.8m -nanopore-corrected E_coli_K12_1D_R9.2_SpotON_2.pass.fasta

And the result log is shown as below:

Use of implicit split to @_ is deprecated at /home/khanhchu2010/download/canu/Linux-amd64/bin/../lib/site_perl/canu/Grid_Cloud.pm line 65.
-- Canu snapshot v1.7 +268 changes (r8960 0dfc25ade033d184200ac5a74446482fea2f69ce)
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
-- 
-- Read and contig alignments during correction, consensus and GFA building use:
--   힋o큄ic M, 힋ikic M.
--   Edlib: a C/C?+ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_161' (from 'java') with -d64 support.
-- Detected gnuplot version '4.6 patchlevel 0' (from 'gnuplot') and image format 'png'.
-- Detected 24 CPUs and 252 gigabytes of memory.
-- Detected PBS/Torque '6.1.1.1' with 'pbsnodes' binary in /usr/local/torque-6.1.1.1/bin/pbsnodes.
-- Detecting PBS/Torque resources.
-- 
-- Found  16 hosts with  28 cores and  126 GB memory under PBS/Torque control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl      8 GB    4 CPUs  (k-mer counting)
-- Grid:  cormhap    6 GB   14 CPUs  (overlap detection with mhap)
-- Grid:  obtovl     4 GB    7 CPUs  (overlap detection)
-- Grid:  utgovl     4 GB    7 CPUs  (overlap detection)
-- Grid:  ovb        4 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8 GB    1 CPU   (overlap store sorting)
-- Grid:  red        8 GB    4 CPUs  (read error detection)
-- Grid:  oea        4 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       16 GB    4 CPUs  (contig construction with bogart)
-- Grid:  gfa        8 GB    4 CPUs  (GFA alignment and processing)
--
-- Found Nanopore corrected reads in the input files.
--
-- Generating assembly 'ecoli' in '/home/khanhchu2010/Data/20180623_ecoli_test_run/canu-ecoli'
--
-- Parameters:
--
--  genomeSize        4800000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1440 ( 14.40%)
--    utgOvlErrorRate 0.1440 ( 14.40%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1440 ( 14.40%)
--    utgErrorRate    0.1440 ( 14.40%)
--    cnsErrorRate    0.1920 ( 19.20%)
----------------------------------------
-- Starting command on Wed Jun 27 17:41:27 2018 with 2448.215 GB free disk space

    cd /home/khanhchu2010/Data/20180623_ecoli_test_run/canu-ecoli
    qsub \
      -j oe \
      -d `pwd` \
      -l mem=4g \
      -l nodes=1:ppn=1   \
      -N 'canu_ecoli' \
      -o canu-scripts/canu.01.out  canu-scripts/canu.01.sh
60298.master

-- Finished on Wed Jun 27 17:41:27 2018 (lickety-split) with 2448.215 GB free disk space
----------------------------------------

I really appreciate for your help!

brianwalenz commented 6 years ago

It detected your PBS grid, and continued the execution there. Eventually, it'll finish and results will (magically) appear in the assembly directory. You can check on progress by looking at the *.out files in canu-ecoli/canu-scripts.

Note that if you ran this same command multiple times, each time it submitted itself to the grid, and so you'll end up with multiple assemblies each writing files to the same place, creating a giant mess.

To run without the grid, add useGrid=false. Be sure to erase the existing canu-ecoli, or use a different -d directory.

skoren commented 6 years ago

Idle, no bug reported.