please help - got to the end of the utgovl stage and it stopped

eleanorfurness commented 5 years ago

Hi all, I'm a total novice at bioinformatics so please bear with me. I've been attempting a bacterial genome assembly with canu using nanopore reads. I submitted it to my uni hpc friday afternoon and it was running, all be it very slow. It got to the end of the utgovl stage and stopped this morning. I'm going back and forth through the output files that are there and I'm not sure what to look at to know where it went wrong? I have the trimmed and corrected files but no fasta files. I think the ovlStore.summary may be of use ? Here it is ...

cat cryo_canu.ovlStore.summary
category            reads     %          read length        feature size or coverage  analysis
----------------  -------  -------  ----------------------  ------------------------  --------------------
middle-missing          0    0.00        0.00 +- 0.00             0.00 +- 0.00       (bad trimming)
middle-hump             0    0.00        0.00 +- 0.00             0.00 +- 0.00       (bad trimming)
no-5-prime              4    0.01     1639.00 +- 624.62         599.50 +- 499.93     (bad trimming)
no-3-prime              5    0.01     1275.20 +- 126.46         385.20 +- 160.24     (bad trimming)

low-coverage         1450    1.86     1603.90 +- 523.10          23.50 +- 10.30      (easy to assemble, potential for lower quality consensus)
unique              74991   96.03     7239.54 +- 5221.67        132.42 +- 15.43      (easy to assemble, perfect, yay)
repeat-cont           587    0.75     6378.51 +- 3716.67        258.73 +- 22.63      (potential for consensus errors, no impact on assembly)
repeat-dove             0    0.00        0.00 +- 0.00             0.00 +- 0.00       (hard to assemble, likely won't assemble correctly or even at all)

span-repeat           282    0.36     7721.22 +- 6276.62       1846.94 +- 3175.41    (read spans a large repeat, usually easy to assemble)
uniq-repeat-cont      749    0.96     7832.46 +- 5777.31                             (should be uniquely placed, low potential for consensus errors, no impact on assembly)
uniq-repeat-dove       11    0.01    24899.82 +- 8523.94                             (will end contigs, potential to misassemble)
uniq-anchor             0    0.00        0.00 +- 0.00             0.00 +- 0.00       (repeat read, with unique section, probable bad read)

any advice greatly appreciated ! Thanks in advance :)

Ellie

skoren commented 5 years ago

Your ovl store looks fine and like most of your genome is easy to assemble. What command did you use to run Canu and what was the output from the run before it stopped (or the full output from the run)?

eleanorfurness commented 5 years ago

Hi,

Thanks for getting back to me. I just tried to re-submit it and it failed at the red stage. Here are the error messages from red.1.out : -

 tail unitigging/3-overlapErrorAdjustment/red.1.out -n50
Running job 1 based on SGE_TASK_ID=1 and offset=0.
Read_Frags()-- Loading target reads 1 through 96708 with 500098196 bases.
Read_Frags()-- 6.054 GB for bases, votes and info.

Read_Olaps()-- Loading 16299623 overlaps.
Read_Olaps()-- 0.243 GB for overlaps..

extractReads()-- Loading reads 1 to 104248 (75212 reads with 536879020 bases) overlaps 0 through 15701399.
extractReads()-- Loaded.
processReads()-- Launching compute.
terminate called after throwing an instance of 'std::bad_alloc'
terminate called recursively
terminate called recursively

Failed with 'terminate called recursively

Failed with '  what():  AbortedAborted
Failed with 'std::bad_alloc'; backtrace (libbacktrace):

'; backtrace (libbacktrace):
Aborted'; backtrace (libbacktrace):

Failed with 'Aborted'; backtrace (libbacktrace):
extractReads()-- Loading reads 104249 to 108282 (2864 reads with 20495995 bases) overlaps 15701399 through 16299623.
terminate called recursively

Failed with 'Aborted'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()

Failed with 'Segmentation fault'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()

Failed with 'Segmentation fault'; backtrace (libbacktrace):
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP7siginfoPv()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
(null)::0 in (null)()
/cm/local/apps/sge/current/spool/node006/job_scripts/3220941: line 89: 28106 Segmentation fault      (core dumped) $bin/findErrors -S ../../cryo_canu.seqStore -O ../cryo_canu.ovlStore -R $minid $maxid -e 0.105 -l 500 -o ./$jobid.red.WORKING -t 4

And here's the output of canu.out

cat canu.out 
 Found   1 host  with  32 cores and  504 GB memory under Sun Grid Engine control.
-- Found   3 hosts with  64 cores and  252 GB memory under Sun Grid Engine control.
-- Found   1 host  with  16 cores and 1009 GB memory under Sun Grid Engine control.
-- Found   2 hosts with   8 cores and  189 GB memory under Sun Grid Engine control.
-- Found   1 host  with  64 cores and  236 GB memory under Sun Grid Engine control.
-- Found   1 host  with   8 cores and   23 GB memory under Sun Grid Engine control.
-- Found   3 hosts with  32 cores and   94 GB memory under Sun Grid Engine control.
-- Found   1 host  with  64 cores and  504 GB memory under Sun Grid Engine control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl     10 GB    1 CPU   (k-mer counting)
-- Grid:  hap        8 GB    4 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap   10 GB    8 CPUs  (overlap detection with mhap)
-- Grid:  obtovl    10 GB    1 CPU   (overlap detection)
-- Grid:  utgovl    10 GB    1 CPU   (overlap detection)
-- Grid:  ovb        1 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        1 GB    1 CPU   (overlap store sorting)
-- Grid:  red       32 GB    4 CPUs  (read error detection)
-- Grid:  oea       32 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat      200 GB    4 CPUs  (contig construction with bogart)
-- Grid:  gfa      200 GB    4 CPUs  (GFA alignment and processing)
--
-- In 'cryo_canu.seqStore', found Nanopore reads:
--   Raw:        108282
--   Corrected:  86818
--   Trimmed:    78088
--
-- Generating assembly 'cryo_canu' in '/ibers/ernie/home/elf18/cryo_canu'
--
-- Parameters:
--
--  genomeSize        4500000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1050 ( 10.50%)
--    utgOvlErrorRate 0.1050 ( 10.50%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1050 ( 10.50%)
--    utgErrorRate    0.1050 ( 10.50%)
--    cnsErrorRate    0.1050 ( 10.50%)
--
--
-- BEGIN ASSEMBLY
--
--
-- Read error detection jobs failed, tried 2 times, giving up.
--   job 00001.red FAILED.
--

ABORT:
ABORT: Canu 1.8
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.

Here's my job script...it's long but I tried smaller, simpler scripts and didn't work for me. The script is from my friend and I just altered it a little.

cat newjob.sh
#!/bin/bash

./canu-1.8/Linux-amd64/bin/canu -p cryo_canu -d cryo_canu \
        genomeSize=4.5m stopOnLowCoverage=0 \
        -nanopore-raw cryo.pore.filt.fastq \
        corMinCoverage=0 corOutCoverage=all corMhapSensitivity=high \
        'correctedErrorRate=0.105' corMaxEvidenceCoverageLocal=10 corMaxEvidenceCoverageGlobal=10 oeaMemory=50g \
        'redMemory=50g' batMemory=200 \
        java="/cm/shared/apps/java/jdk1.8.0_40/bin/java" \
        gridEngineThreadsOption="-pe multithread THREADS" gridEngineMemoryOption="-l h_vmem=MEMORY" \
        gridOptionsExecutive="-l h_vmem=30g" gridEngineArrayMaxJobs=5 gridOptions="-S /bin/bash" \
        merylMemory=10 merylThreads=1 \
        cormhapMemory=10 cormhapThreads=1 gridOptionscormhap="-l h_vmem=30g" \
        mhapThreads=8 \
        gridOptionscns="-l h_vmem=30g" \
        gridOptionscor="-l h_vmem=30g" \
        gridOptionscorovl="-l h_vmem=0.5g" corovlThreads=1 corovlMemory=0.5 \
        gridOptionsobtovl="-l h_vmem=10g" obtovlThreads=1 obtovlMemory=10 \
        gridOptionsutgovl="-l h_vmem=10g" utgovlThreads=1 utgovlMemory=10 \
        gridOptionsobtmhap="-l h_vmem=30g" \
        gridOptionsobtmmap="-l h_vmem=30g" \
        gridOptionsovb="-l h_vmem=1g" ovbThreads=1 ovbMemory=1 \
        gridOptionsovs="-l h_vmem=1g" ovsThreads=1 ovsMemory=1 \
        gridOptionsgfa="-l h_vmem=200g" gfaMemory=200g

Then I submit it to my uni hpc with this : - qsub -V -b n -cwd -N cryo_canu newjob.sh

Any ideas? Cheers, Ellie.

skoren commented 5 years ago

I suspect it's running out of memory due to the error. The easy workaround is to add gridOptionsred="-l h_vmem=50g"

However, I'd rather get rid of all those options you have to simplify the run. I see the failing step was one of the few things you don't have an explicit grid request defined for and I see you're using h_vem but request full memory not per-core memory which is how memory on SGE is typically requested. Can you run:

qconf -sc |grep MEMORY

I suspect h_vmem might not be what you want to use.

Even without the fix, a bunch of the options are definitely not needed as they relate to things you're not running:

        gridOptionsobtmhap="-l h_vmem=30g" \
        gridOptionsobtmmap="-l h_vmem=30g" \
        gridOptionscorovl="-l h_vmem=0.5g" corovlThreads=1 corovlMemory=0.5 \

I also suspect gridEngineArrayMaxJobs=5 isn't going to do what you expect it to, it won't limit your run to 5 jobs at a time.

These are also probably not needed, unless you have low coverage you're just going to waste extra compute time on reads you don't need. corMinCoverage=0 corOutCoverage=all corMhapSensitivity=high 'correctedErrorRate=0.105' corMaxEvidenceCoverageLocal=10 corMaxEvidenceCoverageGlobal=10

eleanorfurness commented 5 years ago

Thank you very much, I'm trying all of these suggestions. I'll let you know how I get on.

skoren commented 5 years ago

Any updates on this?

eleanorfurness commented 5 years ago

Apologies for the late reply I'm currently away with family on holiday and haven't been on my emails. Thank you for checking in on me and for all your help and advice. You're very kind!

I tried a few times before going but was still struggling. My colleague (who's much better at bioinformatics) was busy, so I didn't get to ask him for help while trying your suggestions. I'm going to pick back up where I left off as soon as I'm back, hopefully with a little help from colleagues.

Thanks again.

eleanorfurness commented 5 years ago

Hi Skoren,

Thank you so much for all your advice. I did it! My first ever genome assembly. After I had my break and came back at it with fresh eyes, it made much more sense and just today I finally did it. I incorporated all your suggestions. Was great to see CANU work so well :) I have a 4,079,695bp contig and one repeat contig section of 40, 269 bp.

The GFA looks awesome in bandage too, heres the big contig...

bandage graph.docx

and with the repeat in there too...

bandage both nodes graph.docx

I have run it through PROKKA, which worked well, just trying to figure out artemis and dnaplotter now.

Many thanks, Ellie

skoren commented 5 years ago

Glad to hear it worked, I will close this issue.

I would also suggest polishing the assemblies (with nanopolish or medaka) as nanopore assemblies straight from canu will have a lot of indels. You will also likely need to use illumina polishing to get the best quality. You can also check if the two contigs are circular or not (canu will mark them as such but you can also look for an overlap at the end).

marbl / canu

please help - got to the end of the utgovl stage and it stopped #1407