marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
649 stars 178 forks source link

cormhap error #2268

Closed hidvegin closed 7 months ago

hidvegin commented 9 months ago

I use the latest canu release under Red Hat Enterprise Linux release 8.6 with slurm 22.05.7. I used this script:

/home/fk8jybr/program/canu-2.2/bin/canu -p 0.070 -d /project/denolen/output/canu-2.2/0.070/ genomeSize=4g gridOptions="--time=7-00:00:00 --partition=cpu --account denolen" batMemory=200 gridEngineArrayOption="-a ARRAY_JOBS%30" correctedErrorRate=0.070 -pacbio /project/denolen/input/LC001pacbio.fastq.gz The script is crashed somewhere but I do not know exactly where. I re-run the same script but when canu started the jobs between 1-495 at cormhap phase the canu is failed with this error:

----------------------------------------
-- Starting command on
    cd /project/denolen/output/canu-2.2/0.070
    sbatch \
      --depend=afterany:cormhap_0.070:cormhap_0.070:cormhap_0.070:cormhap_0.070:cormhap_0.070:cormhap_0.070 \
      --cpus-per-task=1 \
      --mem-per-cpu=4g \
      --time=7-00:00:00 \
      --partition=cpu \
      --account denolen  \
      -D `pwd` \
      -J 'canu_0.070' \
      -o canu-scripts/canu.18.out  canu-scripts/canu.18.sh
sbatch: error: Batch script is empty!

-- Finished on ----------------------------------------

ERROR:
ERROR:  Failed with exit code ERROR:
-- Failed to submit Canu executive.  Giving up after two tries.

How can I resolve this error?

skoren commented 9 months ago

The depend string definitely looks incorrect, like a previous job submission failed and/or returned an invalid message. What is the contents of canu.out that is in your folder?

hidvegin commented 9 months ago

Thank @skoren for your answer. The canu.out is also empty. In the canu-scripts folder, the canu.17.sh and canu.18.sh also empty. This it the canu.16.sh:

#!/bin/sh

#  Path to Canu.

bin="/project/home/fk8jybr/program/canu-2.2/bin"

#  Report paths.

echo ""
echo "Found perl:"
echo "  " `which perl`
echo "  " `perl --version | grep version`
echo ""
echo "Found java:"
echo "  " `which java`
echo "  " `java -showversion 2>&1 | head -n 1`
echo ""
echo "Found canu:"
echo "  " $bin/canu
echo "  " `$bin/canu -version`
echo ""

#  Environment for any object storage.

export CANU_OBJECT_STORE_CLIENT=
export CANU_OBJECT_STORE_CLIENT_UA=
export CANU_OBJECT_STORE_CLIENT_DA=
export CANU_OBJECT_STORE_NAMESPACE=
export CANU_OBJECT_STORE_PROJECT=

rm -f canu.out
ln -s canu-scripts/canu.16.out canu.out

/usr/bin/env perl \
$bin/canu -p '0.070' 'genomeSize=4g' 'gridOptions=--time=7-00:00:00 --partition=cpu --account denolen' 'batMemory=200' 'gridEngineArrayOption=-a ARRAY_JOBS%30' 'correctedErrorRate=0.070' -pacbio '/project/denolen/input/LC001pacbio.fastq.gz  canuIteration=1

In the canu-logs folder also the last two files are empty. The last not empty file contains this:

###
###  Reading options from '/project/home/fk8jybr/program/canu-2.2/bin/canu.defaults'
###

# Add site specific options (for setting up Grid or limiting memory/threads) here.

###
###  Reading options from the command line.
###

genomeSize=4g
gridOptions=--time=7-00:00:00 --partition=cpu --account denolen
batMemory=200
gridEngineArrayOption=-a ARRAY_JOBS%30
correctedErrorRate=0.070
canuIteration=1

I do not know what happened with the canu job but when I re-run canu, canu start all of the cormhap_1-499 files and the canu dependency job is faild.

skoren commented 9 months ago

What's the recursive file listing in your canu folder (ls -lhR /project/denolen/output/canu-2.2/0.070)? Can you upload all the canu.*.out files in the folder to see which is the last one that did any work? Also, any out/err files in the correction/1-overlapper as well as any *jobSubmit*.sh files there.

hidvegin commented 9 months ago

I uploaded here the files: masodik.zip files.log

When I re-run the canu script, canu start the faild files in cormhap phase but the dependancy canu script is faild and after the finished cormhap phase, canu stop working.

skoren commented 9 months ago

Is this a different run? The files don't seem to match the run you originally mentioned in the issue. I don't see any files that say "Failed to submit" nor are there any canu.*.out jobs past 03 when the original one you posted said canu.18.out. The file canu.out is also not empty. Everything here looks OK and a subset of mhap jobs are failing which get retried and fail again at which point canu stops. This is how it should work and I don't see any weird submit commands or dependencies.

The question from these logs is why did the subset of jobs fail to complete. I'd guess insufficient time on the grid but you should be able to check the job history to see why these jobs failed. For example, check the history of job 2912743 and how much time/memory it used vs requested.

hidvegin commented 9 months ago

Yes, it is a new run. I restarted canu with the same script, which I mentioned earlier. But now, I get the same error. This is the files about the error: canu_error.zip

In the slurm file, you will see the empty canu.out file and "Failed to submit".

skoren commented 9 months ago

I wouldn't worry about the second error then, the question is why did the original jobs fail. None of the error logs show a failure from the job so I'd guess you either ran out of space or they were killed by your grid manager. Check the history of the job I mentioned above on your grid and see how much memory/time it requested/used and confirm you're not at your space quota.

hidvegin commented 9 months ago

Thank @skoren for your answer. You were right. The problem was the run out of space. I asked more space from the HPC administrator. I hope 40 TB will be enough. Are there any option for decrease the space usage?

skoren commented 9 months ago

Yes, the FAQ has suggestions to reduce disk usage here: https://canu.readthedocs.io/en/latest/faq.html#my-assembly-is-running-out-of-space-is-too-slow. However, those options require a restart from scratch and I suspect 40tb will be sufficient without them.

hidvegin commented 8 months ago

Thanks @skoren for your answer.

In the cns phase I got an error again. I see in the slurm output file htis:

-- canu 2.2
--
-- CITATIONS
--
-- For 'standard' assemblies of PacBio or Nanopore reads:
--   Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
--   Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
--   Genome Res. 2017 May;27(5):722-736.
--   http://doi.org/10.1101/gr.215087.116
--
-- Read and contig alignments during correction and consensus use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
--
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
--
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
--
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
--
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
--
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
--
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
--
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '19.0.1' (from 'java') without -d64 support.
--
-- WARNING:
-- WARNING:  Failed to run gnuplot using command 'gnuplot'.
-- WARNING:  Plots will be disabled.
-- WARNING:
--
--
-- Detected 48 CPUs and 48000 gigabytes of memory on the local machine.
--
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '19.0.1' (from 'java') without -d64 support.
--
-- WARNING:
-- WARNING:  Failed to run gnuplot using command 'gnuplot'.
-- WARNING:  Plots will be disabled.
-- WARNING:
--
--
-- Detected 48 CPUs and 48000 gigabytes of memory on the local machine.
--
-- Detected Slurm with 'sinfo' binary in /usr/bin/sinfo.
-- Detected Slurm with task IDs up to 999 allowed.
--
-- Slurm support detected.  Resources available:
--      1 host  with 288 cores and 11903 GB memory.
--     58 hosts with  64 cores and  249 GB memory.
--    184 hosts with 128 cores and  249 GB memory.
--      4 hosts with 128 cores and  501 GB memory.
--
--                         (tag)Threads
--                (tag)Memory         |
--        (tag)             |         |  algorithm
--        -------  ----------  --------  -----------------------------
-- Detected Slurm with task IDs up to 999 allowed.
--
-- Slurm support detected.  Resources available:
--      1 host  with 288 cores and 11903 GB memory.
--     58 hosts with  64 cores and  249 GB memory.
--    184 hosts with 128 cores and  249 GB memory.
--      4 hosts with 128 cores and  501 GB memory.
--
--                         (tag)Threads
--                (tag)Memory         |
--        (tag)             |         |  algorithm
--        -------  ----------  --------  -----------------------------
-- Grid:  meryl     24.000 GB    8 CPUs  (k-mer counting)
-- Grid:  hap       16.000 GB   32 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap   31.000 GB   16 CPUs  (overlap detection with mhap)
-- Grid:  obtovl    24.000 GB   16 CPUs  (overlap detection)
-- Grid:  utgovl    24.000 GB   16 CPUs  (overlap detection)
-- Grid:  cor        -.--- GB    4 CPUs  (read correction)
-- Grid:  ovb        4.000 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs       32.000 GB    1 CPU   (overlap store sorting)
-- Grid:  red       33.000 GB    8 CPUs  (read error detection)
-- Grid:  oea        8.000 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat      200.000 GB   32 CPUs  (contig construction with bogart)
-- Grid:  cns        -.--- GB    8 CPUs  (consensus)
--
-- Grid:  hap       16.000 GB   32 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap   31.000 GB   16 CPUs  (overlap detection with mhap)
-- Grid:  obtovl    24.000 GB   16 CPUs  (overlap detection)
-- Grid:  utgovl    24.000 GB   16 CPUs  (overlap detection)
-- Grid:  cor        -.--- GB    4 CPUs  (read correction)
-- Grid:  ovb        4.000 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs       32.000 GB    1 CPU   (overlap store sorting)
-- Grid:  red       33.000 GB    8 CPUs  (read error detection)
-- Grid:  oea        8.000 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat      200.000 GB   32 CPUs  (contig construction with bogart)
-- Grid:  cns        -.--- GB    8 CPUs  (consensus)
--
-- Found PacBio CLR reads in '0.070.seqStore':
--   Libraries:
--     PacBio CLR:            1
--   Reads:
--     Raw:                   130133008970
--     Corrected:             95373203011
--     Corrected and Trimmed: 92179573151
--
--
-- Generating assembly '0.070' in '/project/denolen/output/canu-2.2/0.070':
--   genomeSize:
--     4000000000
--
--   Libraries:
--     PacBio CLR:            1
--   Reads:
--     Raw:                   130133008970
--     Corrected:             95373203011
--     Corrected and Trimmed: 92179573151
--
--
-- Generating assembly '0.070' in '/project/denolen/output/canu-2.2/0.070':
--   genomeSize:
--     4000000000
--
--   Overlap Generation Limits:
--     corOvlErrorRate 0.2400 ( 24.00%)
--     obtOvlErrorRate 0.0700 (  7.00%)
--     utgOvlErrorRate 0.0700 (  7.00%)
--
--   Overlap Processing Limits:
--     corErrorRate    0.2500 ( 25.00%)
--     obtErrorRate    0.0700 (  7.00%)
--     utgErrorRate    0.0700 (  7.00%)
--     cnsErrorRate    0.0700 (  7.00%)
--
--   Stages to run:
--     assemble corrected and trimmed reads.
--     corOvlErrorRate 0.2400 ( 24.00%)
--     obtOvlErrorRate 0.0700 (  7.00%)
--     utgOvlErrorRate 0.0700 (  7.00%)
--
--   Overlap Processing Limits:
--     corErrorRate    0.2500 ( 25.00%)
--     obtErrorRate    0.0700 (  7.00%)
--     utgErrorRate    0.0700 (  7.00%)
--     cnsErrorRate    0.0700 (  7.00%)
--
--   Stages to run:
--     assemble corrected and trimmed reads.
--
--
-- Correction skipped; not enabled.
--
-- Trimming skipped; not enabled.
--
-- BEGIN ASSEMBLY
-- Using slow alignment for consensus (iteration '0').
-- Configured 280 consensus jobs.
--
-- Grid:  cns        1.250 GB    8 CPUs  (consensus)
--
--
--
-- Correction skipped; not enabled.
--
-- Trimming skipped; not enabled.
--
-- BEGIN ASSEMBLY
-- Using slow alignment for consensus (iteration '0').
-- Configured 280 consensus jobs.
--
-- Grid:  cns        1.250 GB    8 CPUs  (consensus)
--
--
-- Running jobs.  First attempt out of 2.
--
-- 'consensus.jobSubmit-01.sh' -> job 3129351 task 7.
-- 'consensus.jobSubmit-02.sh' -> job 3129352 task 189.
-- 'consensus.jobSubmit-03.sh' -> job 3129353 task 202.
-- 'consensus.jobSubmit-04.sh' -> job 3129354 task 204.
-- 'consensus.jobSubmit-05.sh' -> job 3129355 task 206.
-- 'consensus.jobSubmit-06.sh' -> job 3129356 tasks 208-209.
-- 'consensus.jobSubmit-07.sh' -> job 3129357 task 211.
-- 'consensus.jobSubmit-08.sh' -> job 3129358 tasks 213-215.
-- 'consensus.jobSubmit-09.sh' -> job 3129359 tasks 217-280.
--
----------------------------------------
-- Starting command on Sun Nov  5 10:18:09 2023 with 17905.432 GB free disk space

    cd /project/denolen/output/canu-2.2/0.070
    sbatch \
      --depend=afterany:3129351:3129352:3129353:3129354:3129355:3129356:3129357:3129358:3129359 \
      --cpus-per-task=1 \
      --mem-per-cpu=4g \
      --time=7-00:00:00 \
      --partition=cpu \
      --account denolen  \
      -D `pwd` \
      -J 'canu_0.070' \
      -o canu-scripts/canu.33.out  canu-scripts/canu.33.sh
Submitted batch job 3129360

-- Finished on Sun Nov  5 10:18:09 2023 (lickety-split) with 17905.432 GB free disk space
----------------------------------------

The canu.out file mention that some of the cns files are faild:

Found perl:
   /usr/bin/perl
   This is perl 5, version 26, subversion 3 (v5.26.3) built for x86_64-linux-thread-multi

Found java:
   /opt/software/packages/jdk/19.0.1/bin/java
   java version "19.0.1" 2022-10-18

Found canu:
   /project/home/fk8jybr/program/canu-2.2/bin/canu
   canu 2.2

-- canu 2.2
--
-- CITATIONS
--
-- For 'standard' assemblies of PacBio or Nanopore reads:
--   Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
--   Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
--   Genome Res. 2017 May;27(5):722-736.
--   http://doi.org/10.1101/gr.215087.116
--
-- Read and contig alignments during correction and consensus use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
--
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
--
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
--
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
--
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
--
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '19.0.1' (from 'java') without -d64 support.
--
-- WARNING:
-- WARNING:  Failed to run gnuplot using command 'gnuplot'.
-- WARNING:  Plots will be disabled.
-- WARNING:
--
--
-- Detected 3 CPUs and 5121 gigabytes of memory on the local machine.
--
-- Detected Slurm with 'sinfo' binary in /usr/bin/sinfo.
-- Detected Slurm with task IDs up to 999 allowed.
--
-- Slurm support detected.  Resources available:
--      1 host  with 288 cores and 11903 GB memory.
--     58 hosts with  64 cores and  249 GB memory.
--      4 hosts with 128 cores and  501 GB memory.
--    184 hosts with 128 cores and  249 GB memory.
--
--                         (tag)Threads
--                (tag)Memory         |
--        (tag)             |         |  algorithm
--        -------  ----------  --------  -----------------------------
-- Grid:  meryl     24.000 GB    8 CPUs  (k-mer counting)
-- Grid:  hap       16.000 GB   32 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap   31.000 GB   16 CPUs  (overlap detection with mhap)
-- Grid:  obtovl    24.000 GB   16 CPUs  (overlap detection)
-- Grid:  utgovl    24.000 GB   16 CPUs  (overlap detection)
-- Grid:  cor        -.--- GB    4 CPUs  (read correction)
-- Grid:  ovb        4.000 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs       32.000 GB    1 CPU   (overlap store sorting)
-- Grid:  red       33.000 GB    8 CPUs  (read error detection)
-- Grid:  oea        8.000 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat      200.000 GB   32 CPUs  (contig construction with bogart)
-- Grid:  cns        -.--- GB    8 CPUs  (consensus)
--
-- Found PacBio CLR reads in '0.070.seqStore':
--   Libraries:
--     PacBio CLR:            1
--   Reads:
--     Raw:                   130133008970
--     Corrected:             95373203011
--     Corrected and Trimmed: 92179573151
--
--
-- Generating assembly '0.070' in '/project/denolen/output/canu-2.2/0.070':
--   genomeSize:
--     4000000000
--
--   Overlap Generation Limits:
--     corOvlErrorRate 0.2400 ( 24.00%)
--     obtOvlErrorRate 0.0700 (  7.00%)
--     utgOvlErrorRate 0.0700 (  7.00%)
--
--   Overlap Processing Limits:
--     corErrorRate    0.2500 ( 25.00%)
--     obtErrorRate    0.0700 (  7.00%)
--     utgErrorRate    0.0700 (  7.00%)
--     cnsErrorRate    0.0700 (  7.00%)
--
--   Stages to run:
--     assemble corrected and trimmed reads.
--
--
-- Correction skipped; not enabled.
--
-- Trimming skipped; not enabled.
--
-- BEGIN ASSEMBLY
-- Using slow alignment for consensus (iteration '2').
-- Configured 280 consensus jobs.
--
-- Grid:  cns        1.250 GB    8 CPUs  (consensus)
--
--
-- Consensus jobs failed, tried 2 times, giving up.
--   job ctgcns/0007.cns FAILED.
--   job ctgcns/0189.cns FAILED.
--   job ctgcns/0202.cns FAILED.
--   job ctgcns/0204.cns FAILED.
--   job ctgcns/0206.cns FAILED.
--   job ctgcns/0208.cns FAILED.
--   job ctgcns/0209.cns FAILED.
--   job ctgcns/0211.cns FAILED.
--   job ctgcns/0213.cns FAILED.
--   job ctgcns/0214.cns FAILED.
--   job ctgcns/0215.cns FAILED.
--   job ctgcns/0217.cns FAILED.
--   job ctgcns/0218.cns FAILED.
--   job ctgcns/0219.cns FAILED.
--   job ctgcns/0220.cns FAILED.
--   job ctgcns/0221.cns FAILED.
--   job ctgcns/0222.cns FAILED.
--   job ctgcns/0223.cns FAILED.
--   job ctgcns/0224.cns FAILED.
--   job ctgcns/0225.cns FAILED.
--   job ctgcns/0226.cns FAILED.
--   job ctgcns/0227.cns FAILED.
--   job ctgcns/0228.cns FAILED.
--   job ctgcns/0229.cns FAILED.
--   job ctgcns/0230.cns FAILED.
--   job ctgcns/0231.cns FAILED.
--   job ctgcns/0232.cns FAILED.
--   job ctgcns/0233.cns FAILED.
--   job ctgcns/0234.cns FAILED.
--   job ctgcns/0235.cns FAILED.
--   job ctgcns/0236.cns FAILED.
--   job ctgcns/0237.cns FAILED.
--   job ctgcns/0238.cns FAILED.
--   job ctgcns/0239.cns FAILED.
--   job ctgcns/0240.cns FAILED.
--   job ctgcns/0241.cns FAILED.
--   job ctgcns/0242.cns FAILED.
--   job ctgcns/0243.cns FAILED.
--   job ctgcns/0244.cns FAILED.
--   job ctgcns/0245.cns FAILED.
--   job ctgcns/0246.cns FAILED.
--   job ctgcns/0247.cns FAILED.
--   job ctgcns/0248.cns FAILED.
--   job ctgcns/0249.cns FAILED.
--   job ctgcns/0250.cns FAILED.
--   job ctgcns/0251.cns FAILED.
--   job ctgcns/0252.cns FAILED.
--   job ctgcns/0253.cns FAILED.
--   job ctgcns/0254.cns FAILED.
--   job ctgcns/0255.cns FAILED.
--   job ctgcns/0256.cns FAILED.
--   job ctgcns/0257.cns FAILED.
--   job ctgcns/0258.cns FAILED.
--   job ctgcns/0259.cns FAILED.
--   job ctgcns/0260.cns FAILED.
--   job ctgcns/0261.cns FAILED.
--   job ctgcns/0262.cns FAILED.
--   job ctgcns/0263.cns FAILED.
--   job ctgcns/0264.cns FAILED.
--   job ctgcns/0265.cns FAILED.
--   job ctgcns/0266.cns FAILED.
--   job ctgcns/0267.cns FAILED.
--   job ctgcns/0268.cns FAILED.
--   job ctgcns/0269.cns FAILED.
--   job ctgcns/0270.cns FAILED.
--   job ctgcns/0271.cns FAILED.
--   job ctgcns/0272.cns FAILED.
--   job ctgcns/0273.cns FAILED.
--   job ctgcns/0274.cns FAILED.
--   job ctgcns/0275.cns FAILED.
--   job ctgcns/0276.cns FAILED.
--   job ctgcns/0277.cns FAILED.
--   job ctgcns/0278.cns FAILED.
--   job ctgcns/0279.cns FAILED.
--   job ctgcns/0280.cns FAILED.
--

ABORT:
ABORT: canu 2.2
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

When I re-run canu, I got the same error as above. How can I resolve this error?

skoren commented 8 months ago

Given that a lot of sequential jobs are failing and they're near the end of the array of jobs, I'd suspect disk space issues again. What does one of the jobs report in it's logs (something like unitigging/5-consensus/consensus.*.out)?

hidvegin commented 8 months ago

Thank @skoren for your answer. This is the consensus.3129359_234.out file:

Found perl:
   /usr/bin/perl
   This is perl 5, version 26, subversion 3 (v5.26.3) built for x86_64-linux-thread-multi

Found java:
   /opt/software/packages/jdk/19.0.1/bin/java
   java version "19.0.1" 2022-10-18

Found canu:
   /project/home/fk8jybr/program/canu-2.2/bin/canu
   canu 2.2

Running job 234 based on SLURM_ARRAY_TASK_ID=234 and offset=0.
-- Using seqFile '../0.070.ctgStore/partition.0234'.
-- Opening tigStore '../0.070.ctgStore' version 1.
-- Opening output results file './ctgcns/0234.cns.WORKING'.
--
-- Computing consensus for b=0 to e=2179996 with errorRate 0.0700 (max 0.4000) and minimum overlap 500
--
Loading corrected-trimmed reads from seqFile '../0.070.ctgStore/partition.0234'
                           ----------CONTAINED READS----------  -DOVETAIL  READS-
  tigID    length   reads      used coverage  ignored coverage      used coverage
------- --------- -------  -------- -------- -------- --------  -------- --------
    890     27066      46        37   11.59x        0    0.00x         9    3.93x
   1512     18478      67        60   20.85x        0    0.00x         7    3.45x
   1576     18918      66        55   17.93x        0    0.00x        11    4.81x
   1635     16677      74        62   30.39x        0    0.00x        12    7.25x
   1670     21221      58        44   15.13x        0    0.00x        14    5.55x
   2734     21021      59        47   15.71x        0    0.00x        12    4.77x
   2822     22973      54        42   14.07x        0    0.00x        12    4.49x
   2923     14932      83        56   37.48x       23   12.32x         4    2.89x
   3902     18617      67        56   20.68x        0    0.00x        11    4.23x
   4934     30219      41        33    7.13x        0    0.00x         8    2.39x
   5364     18412      67        56   19.90x        0    0.00x        11    5.65x
   5369     17106      72        65   31.81x        0    0.00x         7    4.29x
   5557     24718      51        36   11.44x        0    0.00x        15    5.17x
   5795     18946      65        54   25.11x        0    0.00x        11    6.58x
   5839     19357      65        56   18.10x        0    0.00x         9    3.67x
   5869     27654      45        36    8.43x        0    0.00x         9    3.01x
   7159     28311      44        29    6.50x        0    0.00x        15    3.92x
   7453     21214      58        40   13.68x        0    0.00x        18    7.60x
   7858     17587      71        64   21.98x        0    0.00x         7    3.62x
   7925     21783      58        49   14.83x        0    0.00x         9    3.16x
   8092     25788      49        39    8.34x        0    0.00x        10    3.33x
   8097     15750      80        77   30.45x        0    0.00x         3    1.91x
   8791     22871      54        45   16.01x        0    0.00x         9    4.79x
   8891     19511      63        54   21.72x        0    0.00x         9    5.11x
   9463     10809     114        59   37.28x       51   20.25x         4    3.21x
   9551     27131      46        31    6.92x        0    0.00x        15    4.02x
   9593     19210      65        52   18.15x        0    0.00x        13    6.15x
   9766     19389      64        55   19.32x        0    0.00x         9    3.82x
   9854     15200      81        72   34.57x        1    0.17x         8    5.45x
   9883     25159      49        36   12.63x        0    0.00x        13    4.99x
  10039     16995      74        65   28.06x        0    0.00x         9    5.35x
  10242     20958      60        47   18.34x        0    0.00x        13    5.87x
  11182     24841      50        37   10.84x        0    0.00x        13    3.99x
  11460     18479      67        55   22.47x        0    0.00x        12    5.75x
  12501     17731      70        62   27.50x        0    0.00x         8    4.56x
  12632     24878      50        37   10.34x        0    0.00x        13    4.39x
  13146     21807      58        51   15.77x        0    0.00x         7    3.28x
  13168     18366      68        56   18.98x        0    0.00x        12    4.90x
  13342     22204      56        48   14.11x        0    0.00x         8    3.66x
  13359     27442      45        34    7.77x        0    0.00x        11    3.15x
  14068     32635      38        28    5.71x        0    0.00x        10    3.41x
  14359     27475      45        33    8.20x        0    0.00x        12    3.88x
  14428     21324      58        45   15.46x        0    0.00x        13    5.09x
  14467     14991      82        52   36.51x       25   13.85x         5    4.04x
  14803     20995      60        51   17.63x        0    0.00x         9    3.81x
  14907     18826      67        52   24.26x        0    0.00x        15    6.30x
  15409     26960      46        36   10.06x        0    0.00x        10    4.50x
  15728     24197      51        37   10.68x        0    0.00x        14    5.08x
  15761     23690      52        44   16.14x        0    0.00x         8    3.96x
  15762     28046      44        30    8.88x        0    0.00x        14    5.83x
  15907     19022      65        50   20.16x        0    0.00x        15    6.97x
  16088     20356      61        48   17.06x        0    0.00x        13    5.67x
  16181     25006      50        40   11.35x        0    0.00x        10    3.35x
  16818     15683      79        73   31.78x        0    0.00x         6    3.52x
  16892     25123      50        40   13.46x        0    0.00x        10    6.40x
  16946     22581      55        50   15.81x        0    0.00x         5    2.48x
  17033     24986      50        41   15.72x        0    0.00x         9    3.51x
  17398     14749      84        63   36.63x       16    5.20x         5    3.62x
  17687     14589      85        74   36.18x        3    0.72x         8    4.04x
  17860     25479      49        35    7.29x        0    0.00x        14    3.90x
  17993     19083      65        61   25.76x        0    0.00x         4    2.57x
  18124     26470      47        37    9.75x        0    0.00x        10    3.00x
  18289     21714      57        46   15.49x        0    0.00x        11    5.01x
  18425     16226      77        71   32.04x        0    0.00x         6    3.33x
  18447     25862      48        42   10.44x        0    0.00x         6    2.44x
  18529     25248      50        44   17.24x        0    0.00x         6    2.98x
  19417     13868      90        74   37.47x       12    3.99x         4    2.54x
  19526     27331      45        38   10.99x        0    0.00x         7    3.63x
  19557     23360      54        48   14.69x        0    0.00x         6    2.50x
  20041     19591      64        54   19.88x        0    0.00x        10    4.50x
  21091     21719      57        50   12.41x        0    0.00x         7    3.22x
  21399     22780      54        43   12.11x        0    0.00x        11    4.92x
  21965     28702      44        37    8.68x        0    0.00x         7    2.66x
  22117     25786      48        41    8.05x        0    0.00x         7    2.39x
  22312     22585      55        44   14.72x        0    0.00x        11    4.78x
  22969     15937      79        61   25.94x        0    0.00x        18    9.43x
  23146     25705      49        40   12.90x        0    0.00x         9    4.53x
  23275     18625      66        60   27.53x        0    0.00x         6    3.66x
  23583     19631      63        51   19.44x        0    0.00x        12    5.51x
  24238     28241      44        38    9.12x        0    0.00x         6    2.32x
  24307     16234      76        72   36.24x        0    0.00x         4    2.67x
  24376     13453      93        86   36.46x        1    0.08x         6    3.55x
  25454     29390      43        31    6.55x        0    0.00x        12    3.85x
  25472     29000      43        25    7.84x        0    0.00x        18    6.53x
  25509     15142      82        69   32.83x        1    0.23x        12    7.27x
  25872     15986      79        74   30.38x        0    0.00x         5    3.07x
  25935     22520      55        48   13.77x        0    0.00x         7    2.68x
  26447     13882      91        88   24.60x        0    0.00x         3    2.86x
  26494     22332      56        47   15.28x        0    0.00x         9    3.35x
  26686     21330      59        50   18.22x        0    0.00x         9    4.09x
  27383     26922      46        37   15.36x        0    0.00x         9    4.60x
  27428     27396      45        31    8.26x        0    0.00x        14    5.36x
  27757     25750      48        35   11.18x        0    0.00x        13    5.94x
  27882     18952      66        54   20.63x        0    0.00x        12    6.08x
  27890     24749      51        40   10.91x        0    0.00x        11    4.07x
  27935     35895      35        26    8.69x        0    0.00x         9    3.26x
  28193     28879      43        29    5.72x        0    0.00x        14  /scratch/slurm/SlurmdSpoolDir/x1000c1s3b1n1/job3127620/slurm_script: line 93: 2270585 Killed                  $bin/utgcns -R ../0.070.ctgStore/partition.$jobid -T ../0.070.ctgStore 1 -P $jobid -O ./ctgcns/$jobid.cns.WORKING -maxcoverage 40 -e 0.070 -pbdagcon -edlib -threads 8
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=3127620.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.

As I see, this is a kind of memory issue. Maybe, the memory is not enough?

brianwalenz commented 8 months ago

Correct, slurm killed it for exceeding the requested memory limit. Canu has computed that it requires only 1.25GB memory per job. This is very small, but plausible given the short contigs. You can increase the memory limit with Canu option cnsMemory=8g (or more). No need to clean up anything, just add the option and restart.

However, this assembly looks quite poor (unless I'm missing something). The contigs listed above are very short, yet there are 2.2 million contigs. Computing 2.2 million contigs * 20 Kbp average contig size = 4.4 Gbp assembled size. Is this expected?

hidvegin commented 8 months ago

Thanks @brianwalenz for your answer. I add cnsMemory=8g to my canu script and now it looks good.

The assembled genome size is 4.4 Gbp, so it is correct. Yes, the contigs are very short but I do not know why. This is a diploid, very repetitive plant genome. Can I optimise somehow the canuoptions for better assembly? I tried change the correctedErrorRate between 0.045 and 0.095. Now, I do the 0.070.

skoren commented 8 months ago

If the issue is repeats, I don't think the correctedErrorRate would help, it would only potentially collapse more diverse haplotypes. What kind of data and coverage do you have? Post the report file canu outputs from its run.

hidvegin commented 8 months ago

Thanky @skoren for your answer. I have got a diploid plant genome with about 30x coverage from PacBio. The expected genome size is 4.4 Gbp. I attached the report file from canu. 0.070.report.txt

Also, I have got 30x coverage from 10x Genomics data, 150x coverage from Illumiona PE150 data, and 100x coverage from BGI PE150 data. After canu, I would like to use this datasets for scaffolding.

skoren commented 8 months ago

I didn't realize your coverage is that low. From the report, the reads are also very short, almost all of them are 4-8kb. These are CLR reads and not HiFi? With that low of a coverage and short reads, I'm not sure you can do much to improve the assembly. You can use the FAQ settings for low coverage data but I'm assuming you already are. I would recommend getting HiFi for this genome if possible, you'd likely get a much better assembly than what you have currently.

skoren commented 7 months ago

Idle