marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
661 stars 179 forks source link

Kmer counting (meryl-count) jobs failed #1921

Closed bsierieb1 closed 3 years ago

bsierieb1 commented 3 years ago

Hi, I've been running canu with the following parameters:

canu \
 -p myname \
 -d myname \
 genomeSize=315m \
 -pacbio-hifi input.fastq.gz

Here is canu.out:

Found perl:
   /usr/bin/perl
   This is perl 5, version 26, subversion 3 (v5.26.3) built for x86_64-linux-thread-multi

Found java:
   /share/apps/jdk/1.8.0_271/bin/java
   java version "1.8.0_271"

Found canu:
   /share/apps/canu/2.1.1/bin/canu
   canu 2.1.1

-- canu 2.1.1
--
-- CITATIONS
--
-- For assemblies of PacBio HiFi reads:
--   Nurk S, Walenz BP, Rhiea A, Vollger MR, Logsdon GA, Grothe R, Miga KH, Eichler EE, Phillippy AM, Koren S.
--   HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.
--   biorXiv. 2020.
--   https://doi.org/10.1101/2020.03.14.992248
-- 
-- Read and contig alignments during correction and consensus use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_271' (from '/share/apps/jdk/1.8.0_271/bin/java') with -d64 support.
-- Detected gnuplot version '5.4 patchlevel 1   ' (from 'gnuplot') and image format 'svg'.
-- Detected 48 CPUs and 188 gigabytes of memory.
-- Detected Slurm with 'sinfo' binary in /opt/slurm/bin/sinfo.
-- Detected Slurm with task IDs up to 9999 allowed.
-- 
-- Found 123 hosts with  48 cores and  368 GB memory under Slurm control.
-- Found 522 hosts with  48 cores and  179 GB memory under Slurm control.
-- Found   4 hosts with  96 cores and 3013 GB memory under Slurm control.
--
--                         (tag)Threads
--                (tag)Memory         |
--        (tag)             |         |  algorithm
--        -------  ----------  --------  -----------------------------
-- Grid:  meryl     24.000 GB    8 CPUs  (k-mer counting)
-- Grid:  hap       12.000 GB   24 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap   13.000 GB   16 CPUs  (overlap detection with mhap)
-- Grid:  obtovl     8.000 GB    8 CPUs  (overlap detection)
-- Grid:  utgovl     8.000 GB    8 CPUs  (overlap detection)
-- Grid:  cor       16.000 GB    4 CPUs  (read correction)
-- Grid:  ovb        4.000 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs       16.000 GB    1 CPU   (overlap store sorting)
-- Grid:  red       16.000 GB    6 CPUs  (read error detection)
-- Grid:  oea        8.000 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       64.000 GB    8 CPUs  (contig construction with bogart)
-- Grid:  cns        -.--- GB    8 CPUs  (consensus)
--
-- In 'doli_mssm.seqStore', found PacBio HiFi reads:
--   PacBio HiFi:              1
--
--   Corrected:                1
--   Corrected and Trimmed:    1
--
-- Generating assembly 'doli_mssm' in '/scratch/bs167/Work/tmp/Doli-canu/doli_mssm':
--    - assemble HiFi reads.
--
-- Parameters:
--
--  genomeSize        315000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.0000 (  0.00%)
--    obtOvlErrorRate 0.0250 (  2.50%)
--    utgOvlErrorRate 0.0100 (  1.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.0000 (  0.00%)
--    obtErrorRate    0.0250 (  2.50%)
--    utgErrorRate    0.0100 (  1.00%)
--    cnsErrorRate    0.0500 (  5.00%)
--
--
-- BEGIN ASSEMBLY
--
--
-- Kmer counting (meryl-count) jobs failed, tried 2 times, giving up.
--   job doli_mssm.01.meryl FAILED.
--   job doli_mssm.02.meryl FAILED.
--

ABORT:
ABORT: canu 2.1.1
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:

And the end of the meryl-count.4133579_2.out:

Used 7.341 GB out of 18.000 GB to store    999023342 kmers.
/opt/slurm/data/slurmd/job4133579/slurm_script: line 97: 1267546 Killed                  /share/apps/canu/2.1.1/bin/meryl k=22 threads=8 memory=18 count segment=$jobid/02 ../../doli_mssm.seqStore output ./doli_mssm.$jobid.meryl.WORKING
slurmstepd: error: Detected 1 oom-kill event(s) in step 4133579.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

Could you please help with any tips? Thanks!

skoren commented 3 years ago

It seems that your grid is killing meryl due to a memory error but from the log it doesn't look like it exceeded memory. We've seen cases where slurm configurations don't appropriately manage memory causing this type of issue. You can check the failed job history on your grid to confirm how much memory it requested and used.

An option given this is HiFi data and a relatively small genome is to run Canu on a single node. Add useGrid=false to the canu command and submit it to the grid, reserving a full compute node (any of your 48-core nodes is fine).

bsierieb1 commented 3 years ago

Adding useGrid=False circumvented the issue, thanks a lot, Sergey!