marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
655 stars 179 forks source link

Aborted,core dumped,error with 'alignGFA' in ''4-unitigger' #1726

Closed Anovo-1 closed 4 years ago

Anovo-1 commented 4 years ago

Hello, I'm having a Aborted(core dumped) error with alignGFA in 4-unitigger in unitigging.

Command:

work.sh:

/work/assembly/work_assembly genomeSize=410000000 -pacbio-raw /work/assembly/data/*.fasta useGrid=true gridEngine=sge gridEngineResourceOption="-pe make THREADS -l vf=MEMORY -q all.q" cormhapThreads=8 cormhapMemory=15 obtovlThreads=8 obtovlMemory=15 utgovlThreads=8 utgovlMemory=15 corThreads=8 corMemory=15

alignGFA.sh in workdir:

#!/bin/sh

#  Path to Canu.

syst=`uname -s`
arch=`uname -m | sed s/x86_64/amd64/`

bin="/ehpcdata/genome_analysis/GA_ASM/software/01.canu/02.20200330/$syst-$arch/bin"

if [ ! -d "$bin" ] ; then
  bin="/ehpcdata/genome_analysis/GA_ASM/software/01.canu/02.20200330"
fi

#  Report paths.

echo ""
echo "Found perl:"
echo "  " `which perl`
echo "  " `perl --version | grep version`
echo ""
echo "Found java:"
echo "  " `which /ehpcdata/java/jdk1.8.0_171/bin/java`
echo "  " `/ehpcdata/java/jdk1.8.0_171/bin/java -showversion 2>&1 | head -n 1`
echo ""
echo "Found canu:"
echo "  " $bin/canu
echo "  " `$bin/canu -version`
echo ""

#  Environment for any object storage.

export CANU_OBJECT_STORE_CLIENT=
export CANU_OBJECT_STORE_CLIENT_UA=
export CANU_OBJECT_STORE_CLIENT_DA=
export CANU_OBJECT_STORE_NAMESPACE=
export CANU_OBJECT_STORE_PROJECT=

if [ ! -e ./out.unitigs.aligned.gfa ] ; then

  $bin/alignGFA \
    -T ../out.utgStore 2 \
    -i ./out.unitigs.gfa \
    -e 0.075 \
    -o ./out.unitigs.aligned.gfa \
    -t 8 \
  > ./out.unitigs.aligned.gfa.err 2>&1
fi

if [ ! -e ./out.unitigs.aligned.bed ] ; then

  $bin/alignGFA -bed \
    -T ../out.utgStore 2 \
    -C ../out.ctgStore 2 \
    -i ./out.unitigs.bed \
    -o ./out.unitigs.aligned.bed \
    -t 8 \
  > ./out.unitigs.aligned.bed.err 2>&1
fi

if [ -e ./out.unitigs.aligned.gfa -a \
     -e ./out.unitigs.aligned.bed ] ; then
  echo GFA alignments updated.
  exit 0
else
  echo GFA alignments failed.
  exit 1
fi

Versions:

Canu 2.0 Linux version 3.10.0-957.21.3.el7.x86_64 SGE

Logs:

canu.out:

Found perl:
   /ehpcdata/software/01.common/1.R/bin/perl
   This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java:
   /ehpcdata/java/jdk1.8.0_171/bin/java
   java version "1.8.0_171"

Found canu:
   /ehpcdata/genome_analysis/GA_ASM/software/01.canu/02.20200330/Linux-amd64/bin/canu
   Canu 2.0

-- Canu 2.0
--
-- Detected Java(TM) Runtime Environment '1.8.0_171' (from '/ehpcdata/java/jdk1.8.0_171/bin/java') with -d64 support.
--
-- WARNING:
-- WARNING:  Failed to run gnuplot using command 'gnuplot'.
-- WARNING:  Plots will be disabled.
-- WARNING:
--
-- Detected 32 CPUs and 63 gigabytes of memory.
-- User supplied Parallel Environment 'make'.
-- User supplied Memory Resource      'vf'.
-- 
-- Found   6 hosts with  32 cores and   60 GB memory under Sun Grid Engine control.
--
--                         (tag)Threads
--                (tag)Memory         |
--        (tag)             |         |  algorithm
--        -------  ----------  --------  -----------------------------
-- Grid:  meryl     15.000 GB    8 CPUs  (k-mer counting)
-- Grid:  hap       12.000 GB   16 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap   15.000 GB    8 CPUs  (overlap detection with mhap)
-- Grid:  obtovl    15.000 GB    8 CPUs  (overlap detection)
-- Grid:  utgovl    15.000 GB    8 CPUs  (overlap detection)
-- Grid:  cor       15.000 GB    8 CPUs  (read correction)
-- Grid:  ovb        4.000 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs       16.000 GB    1 CPU   (overlap store sorting)
-- Grid:  red       12.000 GB    6 CPUs  (read error detection)
-- Grid:  oea        8.000 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       60.000 GB    8 CPUs  (contig construction with bogart)
-- Grid:  cns        -.--- GB    8 CPUs  (consensus)
-- Grid:  gfa       60.000 GB    8 CPUs  (GFA alignment and processing)
--
-- In 'out.seqStore', found PacBio CLR reads:
--   PacBio CLR:               1
--
--   Raw:                      1
--   Corrected:                1
--   Corrected and Trimmed:    1
--
-- Generating assembly 'out' in '/work/assembly/work_assembly':
--    - assemble corrected and trimmed reads.
--
-- Parameters:
--
--  genomeSize        410000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.2400 ( 24.00%)
--    obtOvlErrorRate 0.0450 (  4.50%)
--    utgOvlErrorRate 0.0450 (  4.50%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.3000 ( 30.00%)
--    obtErrorRate    0.0450 (  4.50%)
--    utgErrorRate    0.0450 (  4.50%)
--    cnsErrorRate    0.0750 (  7.50%)
--
--
-- BEGIN ASSEMBLY
--
-- No change in report.
--
-- Graph alignment jobs failed, tried 2 times, giving up.
--

ABORT:
ABORT: Canu 2.0
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
You have new mail in /var/spool/mail/root

alignGFA.1.out:

chmod: cannot access ‘/ehpcdata/log/monitor_command/2020052323.log’: No such file or directory

Found perl:
   /ehpcdata/software/01.common/1.R/bin/perl
   This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java:
   /ehpcdata/java/jdk1.8.0_171/bin/java
   java version "1.8.0_171"

Found canu:
   /ehpcdata/genome_analysis/GA_ASM/software/01.canu/02.20200330/Linux-amd64/bin/canu
   Canu 2.0

/ehpcdata/ge2011/default/spool/compute7/job_scripts/4573210: line 56: 26811 Aborted                 (core dumped) $bin/alignGFA -T ../out.utgStore 2 -i ./out.unitigs.gfa -e 0.075 -o ./out.unitigs.aligned.gfa -t 8 > ./out.unitigs.aligned.gfa.err 2>&1
GFA alignments failed.
chmod: cannot access ‘/ehpcdata/log/monitor_command/2020052323.log’: No such file or directory

Found perl:
   /ehpcdata/software/01.common/1.R/bin/perl
   This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java:
   /ehpcdata/java/jdk1.8.0_171/bin/java
   java version "1.8.0_171"

Found canu:
   /ehpcdata/genome_analysis/GA_ASM/software/01.canu/02.20200330/Linux-amd64/bin/canu
   Canu 2.0

/ehpcdata/ge2011/default/spool/compute7/job_scripts/4574530: line 56:  1046 Aborted                 (core dumped) $bin/alignGFA -T ../out.utgStore 2 -i ./out.unitigs.gfa -e 0.075 -o ./out.unitigs.aligned.gfa -t 8 > ./out.unitigs.aligned.gfa.err 2>&1
GFA alignments failed.

And,I found the simular mistake in community(the issue is closed):

656 Assembly failed due to Graph alignment jobs failing

So I didn't delete any files, just added gfaMemory=60 in my command.But it didn't change anything.

Command:

/ehpcdata/genome_analysis/GA_ASM/software/01.canu/used_now/Linux-amd64/bin/canu -p out -d /work/assembly/work_assembly genomeSize=410000000 -pacbio-raw /work/assembly/data/*.fasta useGrid=true gridEngine=sge gridEngineResourceOption="-pe make THREADS -l vf=MEMORY -q all.q" cormhapThreads=8 cormhapMemory=15 obtovlThreads=8 obtovlMemory=15 utgovlThreads=8 utgovlMemory=15 corThreads=8 corMemory=15 gfaMemory=60

Logs:

alignGFA.1.out

chmod: cannot access ‘/ehpcdata/log/monitor_command/2020052323.log’: No such file or directory

Found perl:
   /ehpcdata/software/01.common/1.R/bin/perl
   This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java:
   /ehpcdata/java/jdk1.8.0_171/bin/java
   java version "1.8.0_171"

Found canu:
   /ehpcdata/genome_analysis/GA_ASM/software/01.canu/02.20200330/Linux-amd64/bin/canu
   Canu 2.0

/ehpcdata/ge2011/default/spool/compute7/job_scripts/4573210: line 56: 26811 Aborted                 (core dumped) $bin/alignGFA -T ../out.utgStore 2 -i ./out.unitigs.gfa -e 0.075 -o ./out.unitigs.aligned.gfa -t 8 > ./out.unitigs.aligned.gfa.err 2>&1
GFA alignments failed.
chmod: cannot access ‘/ehpcdata/log/monitor_command/2020052323.log’: No such file or directory

Found perl:
   /ehpcdata/software/01.common/1.R/bin/perl
   This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java:
   /ehpcdata/java/jdk1.8.0_171/bin/java
   java version "1.8.0_171"

Found canu:
   /ehpcdata/genome_analysis/GA_ASM/software/01.canu/02.20200330/Linux-amd64/bin/canu
   Canu 2.0

/ehpcdata/ge2011/default/spool/compute7/job_scripts/4574530: line 56:  1046 Aborted                 (core dumped) $bin/alignGFA -T ../out.utgStore 2 -i ./out.unitigs.gfa -e 0.075 -o ./out.unitigs.aligned.gfa -t 8 > ./out.unitigs.aligned.gfa.err 2>&1
GFA alignments failed.
Found perl:
   /ehpcdata/software/01.common/1.R/bin/perl
   This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java:
   /ehpcdata/java/jdk1.8.0_171/bin/java
   java version "1.8.0_171"

Found canu:
   /ehpcdata/genome_analysis/GA_ASM/software/01.canu/02.20200330/Linux-amd64/bin/canu
   Canu 2.0

/ehpcdata/ge2011/default/spool/compute23/job_scripts/4678838: line 56: 22969 Aborted                 (core dumped) $bin/alignGFA -T ../out.utgStore 2 -i ./out.unitigs.gfa -e 0.075 -o ./out.unitigs.aligned.gfa -t 8 > ./out.unitigs.aligned.gfa.err 2>&1
GFA alignments failed.

Found perl:
   /ehpcdata/software/01.common/1.R/bin/perl
   This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java:
   /ehpcdata/java/jdk1.8.0_171/bin/java
   java version "1.8.0_171"

Found canu:
   /ehpcdata/genome_analysis/GA_ASM/software/01.canu/02.20200330/Linux-amd64/bin/canu
   Canu 2.0

/ehpcdata/ge2011/default/spool/compute0/job_scripts/4680303: line 56: 11403 Aborted                 (core dumped) $bin/alignGFA -T ../out.utgStore 2 -i ./out.unitigs.gfa -e 0.075 -o ./out.unitigs.aligned.gfa -t 8 > ./out.unitigs.aligned.gfa.err 2>&1
GFA alignments failed.

Now,i have some output files which inclued the files named core.xxx and unitigger.success.:

-rw-r--r-- 1 root root       2655 May 28 11:31 alignGFA.1.out
-rw-r--r-- 1 root root         60 May 24 11:33 alignGFA.jobSubmit-01.out
-rwxr-xr-x 1 root root        177 May 24 11:33 alignGFA.jobSubmit-01.sh
-rwxr-xr-x 1 root root       1484 May 23 23:23 alignGFA.sh
-rw------- 1 root root 2004496384 May 23 23:35 core.1046
-rw------- 1 root root 2009333760 May 28 11:31 core.11403
-rw------- 1 root root 2004611072 May 24 11:33 core.22969
-rw------- 1 root root 2004606976 May 23 23:26 core.26811
-rw-r--r-- 1 root root       1671 May 23 20:56 out.001.filterOverlaps.thr000.num000.log
-rw-r--r-- 1 root root       1127 May 23 20:56 out.003.buildGreedy.sizes
-rw-r--r-- 1 root root       1127 May 23 20:56 out.004.buildGreedyOpt.sizes
-rw-r--r-- 1 root root       1127 May 23 20:56 out.005.splitDiscontinuous.sizes
-rw-r--r-- 1 root root       1127 May 23 20:56 out.006.detectSpurs.sizes
-rw-r--r-- 1 root root     316618 May 23 20:56 out.006.detectSpurs.thr000.num000.log
-rw-r--r-- 1 root root       1127 May 23 20:56 out.007.placeContains.sizes
-rw-r--r-- 1 root root       1127 May 23 20:56 out.008.placeContainsOpt.sizes
-rw-r--r-- 1 root root       1127 May 23 20:56 out.009.splitDiscontinuous.sizes
-rw-r--r-- 1 root root       1199 May 23 20:57 out.010.mergeOrphans.sizes
-rw-r--r-- 1 root root   26147469 May 23 20:57 out.010.mergeOrphans.thr000.num000.log
-rw-r--r-- 1 root root        140 May 23 20:57 out.010.mergeOrphans.thr003.num000.log
-rw-r--r-- 1 root root   12039982 May 23 20:57 out.011.reducedGraph.thr000.num000.log
-rw-r--r-- 1 root root        140 May 23 20:57 out.013.assemblyGraph.thr008.num000.log
-rw-r--r-- 1 root root       1339 May 23 20:57 out.014.breakRepeats.sizes
-rw-r--r-- 1 root root    6991120 May 23 20:57 out.014.breakRepeats.thr000.num000.log
-rw-r--r-- 1 root root        140 May 23 20:57 out.014.breakRepeats.thr004.num000.log
-rw-r--r-- 1 root root       3502 May 23 20:58 out.016.generateOutputs.overlaps
-rw-r--r-- 1 root root       1542 May 23 20:58 out.016.generateOutputs.sizes
-rw-r--r-- 1 root root         59 May 23 20:58 out.016.generateOutputs.thr000.num000.log
-rw-r--r-- 1 root root  310767040 May 23 20:58 out.018.generateUnitigs.thr000.num000.log
-rw-r--r-- 1 root root        280 May 23 20:58 out.018.generateUnitigs.thr001.num000.log
-rw-r--r-- 1 root root        141 May 23 20:58 out.018.generateUnitigs.thr007.num000.log
-rw-r--r-- 1 root root      52781 May 23 20:56 out.best.coverageGap
-rw-r--r-- 1 root root   46294179 May 23 20:56 out.best.edges
-rw-r--r-- 1 root root   12528475 May 23 20:56 out.best.edges.gfa
-rw-r--r-- 1 root root    2248344 May 23 20:56 out.best.spurs
-rw-r--r-- 1 root root          0 May 23 20:58 out.contigs.bed
-rw-r--r-- 1 root root     644695 May 23 20:58 out.contigs.gfa
-rw-r--r-- 1 root root   46294179 May 23 20:56 out.initial.edges
-rw-r--r-- 1 root root   12315204 May 23 20:56 out.initial.edges.gfa
-rw-r--r-- 1 root root     584404 May 23 20:56 out.lopsided.pass1
-rw-r--r-- 1 root root  679217770 May 23 20:55 out.non-symmetric-error-rates
-rw-r--r-- 1 root root    1937951 May 23 20:55 out.non-symmetric-overlaps
-rw-r--r-- 1 root root     111128 May 23 20:56 out.spur-scores-iter-1
-rw-r--r-- 1 root root     106140 May 23 20:56 out.spur-scores-iter-2
-rw-r--r-- 1 root root     106140 May 23 20:56 out.spur-scores-iter-3
-rw-r--r-- 1 root root     507304 May 23 23:31 out.unitigs.aligned.bed
-rw-r--r-- 1 root root        416 May 23 23:31 out.unitigs.aligned.bed.err
-rw-r--r-- 1 root root       1495 May 28 11:31 out.unitigs.aligned.gfa.err
-rw-r--r-- 1 root root     506550 May 23 20:58 out.unitigs.bed
-rw-r--r-- 1 root root    2941706 May 23 20:58 out.unitigs.gfa
-rw-r--r-- 1 root root      78180 May 23 20:57 reduced.best.coverageGap
-rw-r--r-- 1 root root   46294179 May 23 20:57 reduced.best.edges
-rw-r--r-- 1 root root   10633583 May 23 20:57 reduced.best.edges.gfa
-rw-r--r-- 1 root root     922075 May 23 20:57 reduced.best.spurs
-rw-r--r-- 1 root root   46294179 May 23 20:57 reduced.initial.edges
-rw-r--r-- 1 root root   10912103 May 23 20:57 reduced.initial.edges.gfa
-rw-r--r-- 1 root root     408013 May 23 20:57 reduced.lopsided.pass1
-rw-r--r-- 1 root root     202884 May 23 20:57 reduced.spur-scores-iter-1
-rw-r--r-- 1 root root     196736 May 23 20:57 reduced.spur-scores-iter-2
-rw-r--r-- 1 root root     196272 May 23 20:57 reduced.spur-scores-iter-3
-rw-r--r-- 1 root root     196272 May 23 20:57 reduced.spur-scores-iter-4
-rw-r--r-- 1 root root        808 May 23 20:58 unitigger.1.out
-rw-r--r-- 1 root root      19499 May 23 20:58 unitigger.err
-rw-r--r-- 1 root root         60 May 23 20:28 unitigger.jobSubmit-01.out
-rwxr-xr-x 1 root root        180 May 23 20:28 unitigger.jobSubmit-01.sh
-rwxr-xr-x 1 root root       2920 May 23 20:28 unitigger.sh
-rw-r--r-- 1 root root          0 May 23 20:59 unitigger.success

Thank you for taking time out of your busy schedule to answer my questions,I am looking forward to your reply!

skoren commented 4 years ago

It looks like this may be a bug in the GFA alignment computation. Can you post the contents of out.unitigs.aligned.gfa.err.

As a workaround to finish the assembly, you may be able to run and re-start Canu:

cp out.unitigs.gfa out.unitigs.aligned.gfa
cp out.unitigs.bed out.unitigs.aligned.bed

If possible, I'd also like to look at this data locally. Are you able to share the out.unitigs.gfa, out.unitigs.bed, and the stores (out.utgStore, out.ctgStore) with us? You can tar them together and upload them to our ftp following the instructions on the FAQ.

Anovo-1 commented 4 years ago

PAST Logs:

out.unitigs.aligned.gfa.err

-- Reading GFA './out.unitigs.gfa'.
gfa:  Loaded 13410 sequences and 59041 links.
-- Loading sequences from tigStore '../out.utgStore' version 1.
-- Loading sequences from tigStore '../out.utgStore' version 2.
-- Resetting sequence lengths.
-- Aligning 59041 links using 8 threads and 7.50 error rate.
ERROR:  edlibAlign()  queryLength  = 0
ERROR:                targetLength = 14008
alignGFA: utility/edlib.C:205: EdlibAlignResult edlibAlign(const char*, int, const char*, int, EdlibAlignConfig): Assertion `queryLength > 0' failed.

Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP9siginfo_tPv()
/ehpcdata/pengbing/00.scripts/00.tools/glibc-2.18/nptl/../sysdeps/unix/sysv/linux/x86_64/sigaction.c::0 in (null)()
../nptl/sysdeps/unix/sysv/linux/raise.c::56 in __GI_raise()
/ehpcdata/pengbing/00.scripts/00.tools/glibc-2.18/stdlib/abort.c::89 in __GI_abort()
/ehpcdata/pengbing/00.scripts/00.tools/glibc-2.18/assert/assert.c::92 in __assert_fail_base()
/ehpcdata/pengbing/00.scripts/00.tools/glibc-2.18/assert/assert.c::101 in __GI___assert_fail()
utility/edlib.C::205 in _Z10edlibAlignPKciS0_i16EdlibAlignConfig()
gfa/alignGFA.C::274 in _Z9checkLinkP7gfaLinkR9sequencesS2_dbb()
gfa/alignGFA.C::645 in _Z10processGFAPcjS_S_dj._omp_fn.0()
(null)::0 in (null)()
/ehpcdata/pengbing/00.scripts/00.tools/glibc-2.18/nptl/pthread_create.c::309 in start_thread()
../sysdeps/unix/sysv/linux/x86_64/clone.S::111 in (null)()
(null)::0 in (null)()

FOR now

These days, I copied the corrected data ,out.trimmedReads.fasta.gz, and ran this step on two other nodes.One is the LSF cluster and the other is the PBS cluster which has a 2Tb fat-node.

LSF_command.sh in LSF didn't change anything

LSF_command.sh:

#!/bin/bash canu -assemble -p out -d assembly genomeSize=410000000 correctedErrorRate=0.045 -pacbio-corrected out.trimmedReads.fasta.gz batMemory=60 batThreads=32 useGrid=true gfaMemory=60 gridEngine=lsf gridOptions=" -q smp"

,but FAT_command.sh in PBS worked.

FAT_command.sh:

#PBS -N work.canu.pbs
#PBS -l nodes=1:ppn=1
#PBS -l mem=1g
#PBS -q fat
#PBS -S /bin/bash

echo begin at `date` @`hostname`

data="/work/out.trimmedReads.fasta.gz"
size="410000000"
prefix="OUT"
cer="0.045"
coc="40"
directory="/work//assembly"

export PATH=/data1/bioinfo/tyli/04.software/01.assembly/canu_20190702/canu/Linux-amd64/bin:$PATH
#######

canu \
 -p ${prefix} -d ${directory} genomeSize=${size} \
 -assemble \
 -pacbio-corrected ${data} \
 correctedErrorRate=${cer} \
 corOutCoverage=${coc} \
 gridOptions=" -q fat" \
 useGrid=true 

echo end at `date`

Now, I have the reference genome.So I guess the error was due to a lack of memory.One of my single nodes is only 60G,Maybe I shouldn't have done that,gfaMemory=60.

Thank u~

skoren commented 4 years ago

I wouldn't have expected memory to cause that error. If you're able to share, I'd still be curious to look at the data locally. Otherwise, since you have an assembly you can close the issue and we'll wait to get another example of the error.

Anovo-1 commented 4 years ago

@skoren Sorry, I am not convenient to give you this data.I just found that I actually have got the reference ( from PBS) which is unitigging by CANU2.0 with trimmed reads (by CANU1.8). The two errors were both assembled with canu2.0.So 1 I guess this error is unique to CANU2.0 Because the ownership of this data is not just me.I will close the issue ,thank u~~

lizhao007 commented 3 years ago

I have the same problem, same error report, same canu2.0 and lsf

skoren commented 3 years ago

You should first update to Canu 2.1.1 as it has bug fixes which should addresses this error (because alignGFA is no longer used).

lizhao007 commented 3 years ago

You should first update to Canu 2.1.1 as it has bug fixes which should addresses this error (because alignGFA is no longer used).

Thanks a lot, i get the result after updating to Canu 2.1.1