marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
658 stars 179 forks source link

Stop after cormhap with No log file supplied #1273

Closed ogrecio closed 5 years ago

ogrecio commented 5 years ago

Hi there, my canu process is stopping after cormhap. I restart the job, but the same error appeared.

Can't figure out what the problem is. Any help, please?

canu -p ONTassembly -d $WDR -fast -minReadLength=1500 -stopOnLowCoverage=0.010 -redMemory=100 -oeaMemory=100 -batMemory=100 -merylMemory=100 -mhapMemory=100 -ovbMemory=100 -ovsMemory=100 -maxMemory=100 -mhapThreads=10 -maxThreads=20 -genomeSize=100m -gridEngineArrayMaxJobs=20 gridOptions="--time=8:00:00 --partition=shared --qos=shared --mem=100GB" -nanopore-raw METALGEN_cleaned_withshorter.fastq

This is the canu.out file:


--
-- Detected 24 CPUs and 126 gigabytes of memory.
-- Limited to 100 gigabytes from maxMemory option.
-- Limited to 20 CPUs from maxThreads option.
-- Detected Slurm with 'sinfo' binary in /bin/sinfo.
-- Detected Slurm with 'MaxArraySize' limited to 10000 jobs.
-- 
-- Found  52 hosts with  20 cores and   60 GB memory under Slurm control.
-- Found   7 hosts with  24 cores and   60 GB memory under Slurm control.
-- Found   1 host  with 128 cores and 3906 GB memory under Slurm control.
-- Found  10 hosts with   4 cores and   30 GB memory under Slurm control.
-- Found 322 hosts with  24 cores and  120 GB memory under Slurm control.
-- Found   8 hosts with  20 cores and  120 GB memory under Slurm control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl    100 GB    8 CPUs  (k-mer counting)
-- Grid:  hap       12 GB    8 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap  100 GB   10 CPUs  (overlap detection with mhap)
-- Grid:  obtmhap  100 GB   10 CPUs  (overlap detection with mhap)
-- Grid:  utgmhap  100 GB   10 CPUs  (overlap detection with mhap)
-- Grid:  ovb      100 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs      100 GB    1 CPU   (overlap store sorting)
-- Grid:  red      100 GB    6 CPUs  (read error detection)
-- Grid:  oea      100 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat      100 GB    8 CPUs  (contig construction with bogart)
-- Grid:  gfa        8 GB    8 CPUs  (GFA alignment and processing)
--
-- In 'ONTassembly.seqStore', found Nanopore reads:
--   Raw:        8172601
--   Corrected:  0
--   Trimmed:    0
--
-- Generating assembly 'ONTassembly' in '/mnt/lustre/scratch/home/otras/ini/ogr/Metagenomics/HybridAssembly'
--
-- Parameters:
--
--  genomeSize        100000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1200 ( 12.00%)
--    utgOvlErrorRate 0.1200 ( 12.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1200 ( 12.00%)
--    utgErrorRate    0.1200 ( 12.00%)
--    cnsErrorRate    0.2000 ( 20.00%)
--
--
-- BEGIN CORRECTION
--
--
-- Running jobs.  First attempt out of 2.
--
-- 'mhap.jobSubmit-01.sh' -> job 2360531 task 59.
-- 'mhap.jobSubmit-02.sh' -> job cormhap_ONTassembly task 61.
-- 'mhap.jobSubmit-03.sh' -> job 2360532 task 63.
-- 'mhap.jobSubmit-04.sh' -> job 2360533 tasks 65-73.
--
----------------------------------------
-- Starting command on Mon Mar  4 23:18:23 2019 with 229917.594 GB free disk space

    cd /mnt/lustre/scratch/home/otras/ini/ogr/Metagenomics/HybridAssembly
    sbatch \
      --depend=afterany:2360531:cormhap_ONTassembly:2360532:2360533 \
      --mem-per-cpu=4g \
      --cpus-per-task=1 \
      --time=8:00:00 \
      --partition=shared \
      --qos=shared \
      --mem=100GB  \
      -D `pwd` \
      -J 'canu_ONTassembly' \
      -o canu-scripts/canu.08.out  canu-scripts/canu.08.sh
sbatch: error: Batch job submission failed: Job dependency problem

-- Finished on Mon Mar  4 23:18:23 2019 (furiously fast) with 229917.594 GB free disk space
----------------------------------------

ERROR:
ERROR:  Failed with exit code 1.  (rc=256)
ERROR:

CRASH:
CRASH: Canu 1.8
CRASH: Please panic, this is abnormal.
ABORT:
CRASH:   Failed to submit script.
CRASH:
CRASH: Failed at /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Compiler/gcc/6.4.0/canu/1.8/bin/../lib/site_perl/canu/Execution.pm line 780.
CRASH:  canu::Execution::submitScript("ONTassembly", "--depend=afterany:2360531:cormhap_ONTassembly:2360532:2360533") called at /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Compiler/gcc/6.4.0/canu/1.8/bin/../lib/site_perl/canu/Execution.pm line 1327
CRASH:  canu::Execution::submitOrRunParallelJob("ONTassembly", "cormhap", "correction/1-overlapper", "mhap", 59, 61, 63, 65, ...) called at /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Compiler/gcc/6.4.0/canu/1.8/bin/../lib/site_perl/canu/OverlapMhap.pm line 774
CRASH:  canu::OverlapMhap::mhapCheck("ONTassembly", "cor", "partial") called at /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Compiler/gcc/6.4.0/canu/1.8/bin/canu line 596
CRASH:  main::overlap("ONTassembly", "cor") called at /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Compiler/gcc/6.4.0/canu/1.8/bin/canu line 783
CRASH: 
CRASH: No log file supplied.
CRASH:
ogrecio commented 5 years ago

I tried with different option parameters, assigning less memory (although thats shouldn't be a problem because I can allocate up to 120Gb per job). It stopped at the same point. Then I tried to restart it and got this error message:

 canu -p ONTassembly -d $WDR -fast -minReadLength=1500 -stopOnLowCoverage=0.010 -redMemory=64 -oeaMemory=64 -batMemory=64 -merylMemory=64 -mhapMemory=64 -maxMemory=64 -mhapThreads=20 -maxThreads=20 -genomeSize=100m -gridEngineArrayMaxJobs=20 gridOptions="--time=8:00:00 --partition=shared --qos=shared --mem=80GB" -nanopore-raw METALGEN_cleaned_withshorter.fastq
-- Canu 1.8
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
--
-- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM.
-- De novo assembly of haplotype-resolved genomes with trio binning.
-- Nat Biotechnol. 2018
-- https//doi.org/10.1038/nbt.4277
--
-- Read and contig alignments during correction, consensus and GFA building use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
--
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
--
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
--
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
--
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_181' (from '/opt/cesga/easybuild-cesga/software/Core/jdk/8u181/bin/java') with -d64 support.
--
-- WARNING:
-- WARNING:  Failed to run gnuplot using command 'gnuplot'.
-- WARNING:  Plots will be disabled.
-- WARNING:
--
-- Detected 24 CPUs and 126 gigabytes of memory.
-- Limited to 64 gigabytes from maxMemory option.
-- Limited to 20 CPUs from maxThreads option.
-- Detected Slurm with 'sinfo' binary in /bin/sinfo.
-- Detected Slurm with 'MaxArraySize' limited to 10000 jobs.
--
-- Found 322 hosts with  24 cores and  120 GB memory under Slurm control.
-- Found   7 hosts with  24 cores and   60 GB memory under Slurm control.
-- Found  10 hosts with   4 cores and   30 GB memory under Slurm control.
-- Found  52 hosts with  20 cores and   60 GB memory under Slurm control.
-- Found   8 hosts with  20 cores and  120 GB memory under Slurm control.
-- Found   1 host  with 128 cores and 3906 GB memory under Slurm control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl     64 GB    8 CPUs  (k-mer counting)
-- Grid:  hap       12 GB    8 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap   64 GB   20 CPUs  (overlap detection with mhap)
-- Grid:  obtmhap   64 GB   20 CPUs  (overlap detection with mhap)
-- Grid:  utgmhap   64 GB   20 CPUs  (overlap detection with mhap)
-- Grid:  ovb        4 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8 GB    1 CPU   (overlap store sorting)
-- Grid:  red       64 GB    6 CPUs  (read error detection)
-- Grid:  oea       64 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       64 GB    8 CPUs  (contig construction with bogart)
-- Grid:  gfa        8 GB    8 CPUs  (GFA alignment and processing)
--
-- In 'ONTassembly.seqStore', found Nanopore reads:
--   Raw:        8172601
--   Corrected:  0
--   Trimmed:    0
--
-- Generating assembly 'ONTassembly' in '/mnt/lustre/scratch/home/otras/ini/ogr/Metagenomics/HybridAssembly'
--
-- Parameters:
--
--  genomeSize        100000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1200 ( 12.00%)
--    utgOvlErrorRate 0.1200 ( 12.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1200 ( 12.00%)
--    utgErrorRate    0.1200 ( 12.00%)
--    cnsErrorRate    0.2000 ( 20.00%)
----------------------------------------
-- Starting command on Tue Mar  5 14:14:41 2019 with 231123.73 GB free disk space

    cd /mnt/lustre/scratch/home/otras/ini/ogr/Metagenomics/HybridAssembly
    sbatch \
      --mem-per-cpu=4g \
      --cpus-per-task=1 \
      --time=8:00:00 \
      --partition=shared \
      --qos=shared \
      --mem=80GB  \
      -D `pwd` \
      -J 'canu_ONTassembly' \
      -o canu-scripts/canu.07.out  canu-scripts/canu.07.sh
sbatch: error: Batch script is empty!

-- Finished on Tue Mar  5 14:14:41 2019 (furiously fast) with 231123.73 GB free disk space
----------------------------------------

ERROR:
ERROR:  Failed with exit code 1.  (rc=256)
ERROR:

CRASH:
CRASH: Canu 1.8
CRASH: Please panic, this is abnormal.
ABORT:
CRASH:   Failed to submit script.
CRASH:
CRASH: Failed at /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Compiler/gcc/6.4.0/canu/1.8/bin/../lib/site_perl/canu/Execution.pm line 780.
CRASH:  canu::Execution::submitScript("ONTassembly", undef) called at /opt/cesga/easybuild-cesga/software/Compiler/gcc/6.4.0/canu/1.8/bin/canu line 771
CRASH:
CRASH: No log file supplied.
CRASH:
skoren commented 5 years ago

All these errors seem to be coming from your grid behaving strangely. In the first case, Canu got back a job name rather than a job id, which indicates an abnormal message got printed after submission. The grid then didn't like this job name as a hold identifier. In the second case, it seems like a file is empty. It was already up to canu.08.sh and then went back to canu.07.sh which shouldn't happen.

What's the contents of canu-scipts? What's the contents of correction/1-overlapper/results.

ogrecio commented 5 years ago

The reason for canu08.sh vs canu07.sh may be that in teh second case is a completely new submision with different parameters, that I tried. The file canu08 was created after attempting restarting the job in the first case (I am guessing).

Here is the content of the folders obtained from the new submission from scratch with new options (at canu07.sh):

 ls canu-scripts/
canu.01.out  canu.01.sh  canu.02.out  canu.02.sh  canu.03.out  canu.03.sh  canu.04.out  canu.04.sh  canu.05.out  canu.05.sh  canu.06.out  canu.06.sh  canu.07.sh
ls correction/1-overlapper/results/
000001.oc   000010.oc   000019.oc   000028.oc   000037.oc   000046.oc   000055.oc   000064.oc   000073.oc   000082.oc   000091.oc   000100.oc   000109.oc
000001.ovb  000010.ovb  000019.ovb  000028.ovb  000037.ovb  000046.ovb  000055.ovb  000064.ovb  000073.ovb  000082.ovb  000091.ovb  000100.ovb  000109.ovb
000002.oc   000011.oc   000020.oc   000029.oc   000038.oc   000047.oc   000056.oc   000065.oc   000074.oc   000083.oc   000092.oc   000101.oc   000110.oc
000002.ovb  000011.ovb  000020.ovb  000029.ovb  000038.ovb  000047.ovb  000056.ovb  000065.ovb  000074.ovb  000083.ovb  000092.ovb  000101.ovb  000110.ovb
000003.oc   000012.oc   000021.oc   000030.oc   000039.oc   000048.oc   000057.oc   000066.oc   000075.oc   000084.oc   000093.oc   000102.oc   000111.oc
000003.ovb  000012.ovb  000021.ovb  000030.ovb  000039.ovb  000048.ovb  000057.ovb  000066.ovb  000075.ovb  000084.ovb  000093.ovb  000102.ovb  000111.ovb
000004.oc   000013.oc   000022.oc   000031.oc   000040.oc   000049.oc   000058.oc   000067.oc   000076.oc   000085.oc   000094.oc   000103.oc   000112.oc
000004.ovb  000013.ovb  000022.ovb  000031.ovb  000040.ovb  000049.ovb  000058.ovb  000067.ovb  000076.ovb  000085.ovb  000094.ovb  000103.ovb  000112.ovb
000005.oc   000014.oc   000023.oc   000032.oc   000041.oc   000050.oc   000059.oc   000068.oc   000077.oc   000086.oc   000095.oc   000104.oc   000113.oc
000005.ovb  000014.ovb  000023.ovb  000032.ovb  000041.ovb  000050.ovb  000059.ovb  000068.ovb  000077.ovb  000086.ovb  000095.ovb  000104.ovb  000113.ovb
000006.oc   000015.oc   000024.oc   000033.oc   000042.oc   000051.oc   000060.oc   000069.oc   000078.oc   000087.oc   000096.oc   000105.oc   000114.oc
000006.ovb  000015.ovb  000024.ovb  000033.ovb  000042.ovb  000051.ovb  000060.ovb  000069.ovb  000078.ovb  000087.ovb  000096.ovb  000105.ovb  000114.ovb
000007.oc   000016.oc   000025.oc   000034.oc   000043.oc   000052.oc   000061.oc   000070.oc   000079.oc   000088.oc   000097.oc   000106.oc   000115.oc
000007.ovb  000016.ovb  000025.ovb  000034.ovb  000043.ovb  000052.ovb  000061.ovb  000070.ovb  000079.ovb  000088.ovb  000097.ovb  000106.ovb  000115.ovb
000008.oc   000017.oc   000026.oc   000035.oc   000044.oc   000053.oc   000062.oc   000071.oc   000080.oc   000089.oc   000098.oc   000107.oc   000116.oc
000008.ovb  000017.ovb  000026.ovb  000035.ovb  000044.ovb  000053.ovb  000062.ovb  000071.ovb  000080.ovb  000089.ovb  000098.ovb  000107.ovb  000116.ovb
000009.oc   000018.oc   000027.oc   000036.oc   000045.oc   000054.oc   000063.oc   000072.oc   000081.oc   000090.oc   000099.oc   000108.oc   000117.oc
000009.ovb  000018.ovb  000027.ovb  000036.ovb  000045.ovb  000054.ovb  000063.ovb  000072.ovb  000081.ovb  000090.ovb  000099.ovb  000108.ovb  000117.ovb
skoren commented 5 years ago

Ah, OK new run explains it. Did you remove the old run already? Are you out of quota/disk space? I don't see any other way Canu would generate an empty file. You can resume the run by removing the empty shell script and re-running Canu which will re-generate it.

ogrecio commented 5 years ago

I removed the empty shell script, and re run canu. Now the ovB jobs are running. I will inform of new issues on this topic or if the assembling finished succesfully. Thank you for now.

ogrecio commented 5 years ago

Hi again, I got the following error

ABORT:   ==> PLACE CONTAINED READS.
ABORT:   
ABORT:   computeErrorProfiles()-- Computing error profiles for 232322 tigs, with 8 threads.
ABORT:   computeErrorProfiles()-- Finished.
ABORT:   
ABORT:   placeContains()-- placing 435640 contained and 7570629 unplaced reads, with 8 threads.
ABORT:   placeContains()-- Placed 240571 contained reads and 2088 unplaced reads.
ABORT:   placeContains()-- Failed to place 195069 contained reads (too high error suspected) and 7568541 unplaced reads (lack of overlaps suspected).
ABORT:   optimizePositions()-- Optimizing read positions for 8172602 reads in 232322 tigs, with 8 threads.
ABORT:   optimizePositions()--   Allocating scratch space for 8172602 reads (510787 KB).
ABORT:   optimizePositions()--   Initializing positions with 8 threads.
ABORT:   optimizePositions()--   Recomputing positions, iteration 1, with 8 threads.
ABORT:   Failed to optimize read 1101569 in tig 12895
ABORT:   bogart: bogart/AS_BAT_OptimizePositions.C:279: void Unitig::optimize_recompute(uint32, optPos*, optPos*, bool): Assertion `cnt > 0' failed.
ABORT:   
ABORT:   Failed with 'Aborted'; backtrace (libbacktrace):
ABORT:   utility/system-stackTrace.C::89 in _Z17AS_UTL_catchCrashiP9siginfo_tPv()
ABORT:   (null)::0 in (null)()
ABORT:   (null)::0 in (null)()
ABORT:   (null)::0 in (null)()
ABORT:   (null)::0 in (null)()
ABORT:   (null)::0 in (null)()
ABORT:   bogart/AS_BAT_OptimizePositions.C::279 in _ZN6Unitig18optimize_recomputeEjP6optPosS1_b()
ABORT:   bogart/AS_BAT_OptimizePositions.C::483 in _ZN9TigVector17optimizePositionsEPKcS1_._omp_fn.1()
ABORT:   ../../../libgomp/team.c::119 in gomp_thread_start()
ABORT:   (null)::0 in (null)()
ABORT:   (null)::0 in (null)()
ABORT:   (null)::0 in (null)()

Full .err file at: unitigger.err.txt

Any help please?

skoren commented 5 years ago

This looks the same as issue #1252, same request as there. Are you able to share the data?

ogrecio commented 5 years ago

Hi Sergey, thank you for your comment. I solve the isssue by increasing the -minOverlapLength parameter from 500 to 1000.

In case it helps: I finalize the assemble the metagenome with the same nanopore samples, but filtering by length>3000 and -minOverlapLength=500.
Then I included nanopore reads>1500 bp and QS>12. Then it failed as per previous message. Then I re run it with -minOverlapLength=1000, and finalized ok.

-minOverlapLength=1000

brianwalenz commented 5 years ago

That's consistent with what I found in #1252. There's some discussion there about what went wrong, and links to patches that get around the problem but don't really fix it.