marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

How do I know if manual Canu correction finished successfully? #1389

Closed tayabsoomro closed 5 years ago

tayabsoomro commented 5 years ago

Hi,

I ran manual Canu correction by running the following command on SGE cluster.

canu -correct -p pb3 genomeSize=25m -d pb3-ont -nanopore-raw $NANOPORE_RAW \
        useGrid=false \
        gridEngineMemoryOption="-l h_vmem=MEMORY" \
        gridEngineThreadsOption="-pe make THREADS"

I get this as my output:

-- Detected Java(TM) Runtime Environment '1.8.0_92' (from 'java').
-- Detected gnuplot version '4.6 patchlevel 0' (from 'gnuplot') and image format 'svg'.
-- Detected 80 CPUs and 1008 gigabytes of memory.
-- Detected Sun Grid Engine in '/opt/gridengine/default'.
-- Grid engine disabled per useGrid=false option.
--
-- Allowed to run  20 jobs concurrently, and use up to   4 compute threads and   16 GB memory for stage 'bogart (unitigger)'.
-- Allowed to run   5 jobs concurrently, and use up to  16 compute threads and    6 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run   5 jobs concurrently, and use up to  16 compute threads and    6 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run   5 jobs concurrently, and use up to  16 compute threads and    6 GB memory for stage 'mhap (overlapper)'.
-- Allowed to run  20 jobs concurrently, and use up to   4 compute threads and    2 GB memory for stage 'read error detection (overlap error adjustment)'.
-- Allowed to run  80 jobs concurrently, and use up to   1 compute thread  and    1 GB memory for stage 'overlap error adjustment'.
-- Allowed to run  20 jobs concurrently, and use up to   4 compute threads and   32 GB memory for stage 'utgcns (consensus)'.
-- Allowed to run  80 jobs concurrently, and use up to   1 compute thread  and    4 GB memory for stage 'overlap store parallel bucketizer'.
-- Allowed to run  80 jobs concurrently, and use up to   1 compute thread  and    8 GB memory for stage 'overlap store parallel sorting'.
-- Allowed to run  80 jobs concurrently, and use up to   1 compute thread  and    6 GB memory for stage 'overlapper'.
-- Allowed to run  10 jobs concurrently, and use up to   8 compute threads and    8 GB memory for stage 'overlapper'.
-- Allowed to run  10 jobs concurrently, and use up to   8 compute threads and    8 GB memory for stage 'overlapper'.
-- Allowed to run  20 jobs concurrently, and use up to   4 compute threads and    8 GB memory for stage 'meryl (k-mer counting)'.
-- Allowed to run  40 jobs concurrently, and use up to   2 compute threads and   16 GB memory for stage 'falcon_sense (read correction)'.
-- Allowed to run   5 jobs concurrently, and use up to  16 compute threads and    6 GB memory for stage 'minimap (overlapper)'.
-- Allowed to run   5 jobs concurrently, and use up to  16 compute threads and    6 GB memory for stage 'minimap (overlapper)'.
-- Allowed to run   5 jobs concurrently, and use up to  16 compute threads and    6 GB memory for stage 'minimap (overlapper)'.
--
-- This is canu parallel iteration #1, out of a maximum of 2 attempts.
--
-- Final error rates before starting pipeline:
--
--   genomeSize          -- 25000000
--   errorRate           -- 0.048
--
--   corOvlErrorRate     -- 0.144
--   obtOvlErrorRate     -- 0.144
--   utgOvlErrorRate     -- 0.144
--
--   obtErrorRate        -- 0.144
--
--   cnsErrorRate        -- 0.144
--
--
-- BEGIN CORRECTION

I am not sure how to figure out whether my correction of reads was successful or if there was an error. I was expecting there to be a message indicating that the correction was successful after "BEGIN CORRECTION" in my output.

Inside my output folder, I have the pb3.correctedReads.fasta.gz which contains ~26K reads. The original file that was given as an input had 3.3 M reads.

Is there anyway to check if something went wrong? Maybe something pb3-ont file?

Thanks, Tayab Soomro.

skoren commented 5 years ago

If you've got a correctedReads.fasta.gz then correction finished but there should be more output than you've posted since your Canu is running locally (unless you restarted it from a previous run which had the output). There is also a report file that has more information on what reads were selected for correction.

skoren commented 5 years ago

I see looking at your canu output, this is a very old version of Canu as error rate used in the way you've set it hasn't been in use since 1.4 (released in 2016). You've set it too too low as well, only 14% error in raw data isn't going to correct anything which is why you have no corrected reads. I don't see it in the command you posted but it clearly got set at some point, perhaps in your default Canu parameters. The default nanopore error rate is 50% not 14. Update to canu 1.8 and run with default parameters.