marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
658 stars 179 forks source link

Consensus jobs failed error #1080

Closed sloanism closed 6 years ago

sloanism commented 6 years ago

I keep encountering the same error. I've tweaked my code based on previous issues that have been solved for others but it continues to be a problem. I am running this through my university's SLURM grid.

Here is my input:

module load canu-gcc/1.7
module load java/1.8.0_66
canu -assemble\
    -p PacBiox21 -d /home/18660056/Teladorsagia_circumcincta/Reads/canu/ \
    java=java \
    genomeSize=700m \
    corMinCoverage=0 corOutCoverage=all correctedErrorRate=0.105 cnsErrorRate=0.25 gridOptions="--time=7-00:00:00" gnuplotTested=true \
    -pacbio-corrected /home/18660056/Teladorsagia_circumcincta/Reads/PacBiox21.trimmedReads.fasta.gz

exit 0

The error I receive is this:

-- Canu 1.7
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
-- 
-- Read and contig alignments during correction, consensus and GFA building use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
--   Li H.
--   Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.
--   Bioinformatics. 2016 Jul 15;32(14):2103-10.
--   http://doi.org/10.1093/bioinformatics/btw152
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_66' (from 'java').
-- Detected 16 CPUs and 126 gigabytes of memory.
-- Detected Slurm with 'sinfo' binary in /usr/bin/sinfo.
-- Detected Slurm with 'MaxArraySize' limited to 1000 jobs.
-- 
-- Found   3 hosts with  16 cores and  126 GB memory under Slurm control.
-- Found   1 host  with  16 cores and  252 GB memory under Slurm control.
-- Found   1 host  with  16 cores and   62 GB memory under Slurm control.
-- Found   2 hosts with  16 cores and  125 GB memory under Slurm control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl     64 GB   16 CPUs  (k-mer counting)
-- Grid:  cormhap   32 GB   16 CPUs  (overlap detection with mhap)
-- Grid:  obtovl    16 GB   16 CPUs  (overlap detection)
-- Grid:  utgovl    16 GB   16 CPUs  (overlap detection)
-- Grid:  ovb        3 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs       16 GB    1 CPU   (overlap store sorting)
-- Grid:  red        8 GB    4 CPUs  (read error detection)
-- Grid:  oea        4 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat      252 GB   16 CPUs  (contig construction)
-- Grid:  gfa       16 GB   16 CPUs  (GFA alignment and processing)
--
-- In 'PacBiox21.gkpStore', found PacBio reads:
--   Raw:        0
--   Corrected:  1636733
--   Trimmed:    1636733
--
-- Generating assembly 'PacBiox21' in '/home/18660056/Teladorsagia_circumcincta/Reads/canu'
--
-- Parameters:
--
--  genomeSize        700000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.1050 ( 10.50%)
--    utgOvlErrorRate 0.1050 ( 10.50%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.1050 ( 10.50%)
--    utgErrorRate    0.1050 ( 10.50%)
--    cnsErrorRate    0.2500 ( 25.00%)
--
--
-- BEGIN ASSEMBLY
--
-- Using slow alignment for consensus (iteration '2').
-- Configured 66 contig and 20 unitig consensus jobs.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  cns        1 GB    8 CPUs  (consensus)
--
--
-- Consensus jobs failed, tried 2 times, giving up.
--   job ctgcns/0015.cns FAILED.
--

ABORT:
ABORT: Canu 1.7
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

I have looked in the ../unitigging/5-consensus/ folder and the consensus.146317_15.out file has this at its end:

generateTemplateStitch()-- generated template of length 16300, expected length 16499, -1.2135% difference.
Generated template of length 16300
Aligning reads.
Finished aligning reads.  0 failed, 50 passed.
Constructing graph
Merging graph
Calling consensus
Working on tig 22248 of length 60754 (521 children)
  unitig 22248 detected 517 contains (17.66x, 91.31%) 4 dovetail (1.68x, 8.69%)
utgcns: overlapInCore/libedlib/edlib.C:163: EdlibAlignResult edlibAlign(const char*, int, const char*, int, EdlibAlignConfig): Assertion `queryLength > 0' failed.

Failed with 'Aborted'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::97 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
overlapInCore/libedlib/edlib.C::163 in _Z10edlibAlignPKciS0_i16EdlibAlignConfig()
utgcns/libcns/unitigConsensus.C::420 in _Z22generateTemplateStitchP8abAbacusP10tgPositionjdb()
utgcns/libcns/unitigConsensus.C::754 in _ZN15unitigConsensus13generatePBDAGEcbP5tgTigPSt3mapIjP6gkReadSt4lessIjESaISt4pairIKjS4_EEEPS2_IjP10gkReadDataS6_SaIS7_IS8_SE_EEE()
utgcns/utgcns.C::581 in main()
(null)::0 in (null)()
(null)::0 in (null)()
/var/spool/slurmd/job146317/slurm_script: line 79: 32217 Aborted                 $bin/utgcns -G ../PacBiox21.${tag}Store/partitionedReads.gkpStore -T ../PacBiox21.${tag}Store 1 $jobid -O ./${tag}cns/$jobid.cns.WORKING -maxcoverage 40 -e 0.25 -pbdagcon -edlib -threads 2-8

Am I right in assuming the edlibAlign() funcion is failing with an assert because it is being handed a zero length 'queryLength' string? And if so, why is this the case and how do I fix it?

Please help! I have no idea how to proceed.

skoren commented 6 years ago

Change the consensus.sh command to add -V -V -V and re-run and post the output. It should give more details on what it was doing before the error. The unitig which is failing is only composed of 4 reads but it is failing to put them together.

sloanism commented 6 years ago

I hope this provides more detail. consensus.146361_15.out.txt

skoren commented 6 years ago

This looks like a bug in the code that isn't guarding against a very short alignment. Can you share the assembly data? How big is the ctgStore in the unitigging folder?

From inside the 5-consensus folder, you should be able to run tar cvzf ctgstore.tar.gz ../*.ctgStore and then follow the FAQ instructions to send us the tar file.

sloanism commented 6 years ago

Canu 1.7.1 was installed last week and I was able to complete the run. So I assume the bug has already been fixed. If it would be of use I can run tar cvzf ctgstore.tar.gz ../*.ctgStore and provide the tar file. Let me know.

Thanks for your help.

brianwalenz commented 6 years ago

I wouldn't mind getting the data to see if I can reproduce the crash, especially since there were no changes in 1.7.1 that would fix it!

sloanism commented 6 years ago

This looks like a bug in the code that isn't guarding against a very short alignment. Can you share the assembly data? How big is the ctgStore in the unitigging folder?

From inside the 5-consensus folder, you should be able to run tar cvzf ctgstore.tar.gz ../*.ctgStore and then follow the FAQ instructions to send us the tar file.

tar file titled issue1080_ctgstore.tar.gz has been sent.

skoren commented 6 years ago

Since this got fixed by other changes in 1.7.1, it is hard to fix/reproduce. Will close and work on fix via #1142 since that fails on tip and appears to be the same issue.