marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
658 stars 179 forks source link

Segmentation faults during correction phase (canu 1.7) #1061

Closed swarris closed 6 years ago

swarris commented 6 years ago

The correction phase gives segmentation faults. I've tried running Canu on a Slurm cluster with different mem/cpu settings, but also on a single node. All failed because of a seg faults during correction. Not all sub-jobs failed, but several do.

In particular ./correction/2-correction/results/0126.err (job: ./correctReads.sh 126 > ./correctReads.000126.out 2>&1)

read9843323 #17 location 0 to template 122-1310 length 1188 diff 0.267677
falconsense: overlapInCore/libedlib/edlib.C:347: void edlibAlignmentToStrings(const unsigned char*, int, int, int, int, int, const char*, const char*, char*, char*): Assertion `strlen(qry_aln_str) == alignmentLength && strlen(tgt_aln_str) == alignmentLength' failed.
falconsense: overlapInCore/libedlib/edlib.C:347: void edlibAlignmentToStrings(const unsigned char*, int, int, int, int, int, const char*, const char*, char*, char*): Assertion `strlen(qry_aln_str) == alignmentLength && strlen(tgt_aln_str) == alignmentLength' failed.
falconsense: overlapInCore/libedlib/edlib.C:347: void edlibAlignmentToStrings(const unsigned char*, int, int, int, int, int, const char*, const char*, char*, char*): Assertion `strlen(qry_aln_str) == alignmentLength && strlen(tgt_aln_str) == alignmentLength' failed.

Failed with '
Failed with '
Failed with 'AbortedAbortedAborted'; backtrace (libbacktrace):
'; backtrace (libbacktrace):
'; backtrace (libbacktrace):
read1877516 #0 location 0 to template 0-3514 length 3514 diff 0.000000
read3139098 #8 location 0 to template 242-2655 length 2413 diff 0.224202
read3139098 #8 location 1 to template 242-2656 length 2414 diff 0.224109
mapped 1877516     0- 3515 to template      0-  3515 trimmed by      0-     0 ACTTATGTCC ACTTATGTCC
mapped 3139098     0- 2445 to template   1074-  3488 trimmed by      0-     1 TGATTTTTAC TGATTTT-AC
read5188789 #1 location 0 to template 0-3166 length 3166 diff 0.246368
read5188789 #1 location 1 to template 0-3169 length 3169 diff 0.246134
mapped 5188789     0- 3157 to template      0-  3167 trimmed by      2-     0 CATAACTCGT C-TAACAC-T
read5914047 #2 location 0 to template 207-3398 length 3191 diff 0.225948
read5914047 #2 location 1 to template 207-3399 length 3192 diff 0.225877
read5914047 #2 location 2 to template 207-3400 length 3193 diff 0.225806
read5914047 #2 location 3 to template 207-3401 length 3194 diff 0.225736
mapped 5914047     0- 3173 to template    207-  3399 trimmed by      0-     0 CCTAAAAAAT CCTAAA---T

Failed with 'Segmentation fault'; backtrace (libbacktrace):
AS_UTL/AS_UTL_stackTrace.C::97 in _Z17AS_UTL_catchCrashiP9siginfo_tPv()
AS_UTL/AS_UTL_stackTrace.C::97 in _Z17AS_UTL_catchCrashiP9siginfo_tPv()
AS_UTL/AS_UTL_stackTrace.C::97 in _Z17AS_UTL_catchCrashiP9siginfo_tPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()

This script is running:

$bin/falconsense \
  -G $gkpStore \
  -C ../genome.corStore \
  -b $bgn -e $end -r ./genome.readsToCorrect \
  -t  24 \
  -cc 4 \
  -cl 1000 \
  -oi 0.7 \
  -ol 500 \
  -p ./results/$jobid.WORKING \
  > ./results/$jobid.err 2>&1 \
&& \
mv ./results/$jobid.WORKING.cns ./results/$jobid.cns \

genome.report.txt

skoren commented 6 years ago

See previous issues #785, #890, #881. Searching for your error on the issues page brings them up. Your sequences have non-ACGT characters in them. This is fixed in 1.7.1, update your install and re-run.

swarris commented 6 years ago

I'm running canu 1.7.1

But indeed biopython found a white space in one of the sequences!

ValueError: Whitespace is not allowed in the sequence.
skoren commented 6 years ago

Any updates, I assume removing the whitespace fixed the run? Re-open if not.