marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
660 stars 179 forks source link

Canu fails due to missing time limit specification #2307

Closed iwilkie closed 7 months ago

iwilkie commented 7 months ago

Hi all,

I'm trying to run Canu (v2.2) on some old raw CLR reads. I am running the jobs on a linux server with SLURM. I created an interactive SLURM session and then submitted my Canu line:

$ salloc -J canu --time=20:00:00 --nodes=5 --mem=20G

$ canu -p 3656 -d 3656 --genomeSize=4m -pacbio ../00_raw_data_from_boran/3656.fastq
-- canu 2.2
--
-- CITATIONS
--
-- For 'standard' assemblies of PacBio or Nanopore reads:
--   Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
--   Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
--   Genome Res. 2017 May;27(5):722-736.
--   http://doi.org/10.1101/gr.215087.116
--
-- Read and contig alignments during correction and consensus use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
--
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
--
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
--
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
--
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
--
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '11.0.9.1-internal' (from '/home/iwilkie/miniconda3/envs/pacbio_assembly/bin/java') without -d64 support.
-- Detected gnuplot version '5.4 patchlevel 1   ' (from 'gnuplot') and image format 'png'.
--
-- Detected 1(x5) CPUs and 20480 gigabytes of memory on the local machine.
Argument "1(x5)" isn't numeric in numeric lt (<) at /home/iwilkie/miniconda3/envs/pacbio_assembly/bin/../lib/site_perl/canu/Grid_Local.pm line 143.
--
-- Detected Slurm with 'sinfo' binary in /usr/bin/sinfo.
-- Detected Slurm with task IDs up to 5000 allowed.
--
-- Slurm support detected.  Resources available:
--      1 host  with  64 cores and 3022 GB memory.
--      2 hosts with  24 cores and  683 GB memory.
--      4 hosts with  40 cores and 2015 GB memory.
--      2 hosts with  40 cores and  683 GB memory.
--      6 hosts with 128 cores and 4019 GB memory.
--     16 hosts with  12 cores and  125 GB memory.
--      6 hosts with 128 cores and 1006 GB memory.
--
--                         (tag)Threads
--                (tag)Memory         |
--        (tag)             |         |  algorithm
--        -------  ----------  --------  -----------------------------
-- Grid:  meryl     12.000 GB    4 CPUs  (k-mer counting)
-- Grid:  hap        8.000 GB    4 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap    6.000 GB    4 CPUs  (overlap detection with mhap)
-- Grid:  obtovl     4.000 GB    4 CPUs  (overlap detection)
-- Grid:  utgovl     4.000 GB    4 CPUs  (overlap detection)
-- Grid:  cor        -.--- GB    4 CPUs  (read correction)
-- Grid:  ovb        4.000 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8.000 GB    1 CPU   (overlap store sorting)
-- Grid:  red       16.000 GB    4 CPUs  (read error detection)
-- Grid:  oea        8.000 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       16.000 GB    4 CPUs  (contig construction with bogart)
-- Grid:  cns        -.--- GB    4 CPUs  (consensus)
--
-- Found untrimmed raw PacBio CLR reads in the input files.
--
-- Generating assembly '3656' in '/bioinf/home/iwilkie/active_projects/07_boran_isolate/01.assembly/3656':
--   genomeSize:
--     4000000
--
--   Overlap Generation Limits:
--     corOvlErrorRate 0.2400 ( 24.00%)
--     obtOvlErrorRate 0.0450 (  4.50%)
--     utgOvlErrorRate 0.0450 (  4.50%)
--
--   Overlap Processing Limits:
--     corErrorRate    0.2500 ( 25.00%)
--     obtErrorRate    0.0450 (  4.50%)
--     utgErrorRate    0.0450 (  4.50%)
--     cnsErrorRate    0.0750 (  7.50%)
--
--   Stages to run:
--     correct raw reads.
--     trim corrected reads.
--     assemble corrected and trimmed reads.
--
--
-- BEGIN CORRECTION
----------------------------------------
-- Starting command on Mon Apr 15 14:43:18 2024 with 1028.672 GB free disk space

    cd .
    ./3656.seqStore.sh \
    > ./3656.seqStore.err 2>&1

-- Finished on Mon Apr 15 14:46:15 2024 (177 seconds) with 1026.766 GB free disk space
----------------------------------------
--
-- In sequence store './3656.seqStore':
--   Found 99455 reads.
--   Found 800003237 bases (200 times coverage).
--    Histogram of raw reads:
--
--    G=800003237                        sum of  ||               length     num
--    NG         length     index       lengths  ||                range    seqs
--    ----- ------------ --------- ------------  ||  ------------------- -------
--    00010        18941      3457     80005844  ||       1000-2080        13227|---------------------------------------------------------------
--    00020        15528      8173    160014288  ||       2081-3161         9785|-----------------------------------------------
--    00030        13593     13700    240005696  ||       3162-4242         7971|--------------------------------------
--    00040        12232     19917    320007194  ||       4243-5323         7004|----------------------------------
--    00050        11178     26767    400007820  ||       5324-6404         6609|--------------------------------
--    00060        10213     34253    480001987  ||       6405-7485         5778|----------------------------
--    00070         8806     42615    560008435  ||       7486-8566         5272|--------------------------
--    00080         6749     52930    640003510  ||       8567-9647         5640|---------------------------
--    00090         4411     67357    720005405  ||       9648-10728        8041|---------------------------------------
--    00100         1000     99454    800003237  ||      10729-11809        7704|-------------------------------------
--    001.000x               99455    800003237  ||      11810-12890        5836|----------------------------
--                                               ||      12891-13971        4230|---------------------
--                                               ||      13972-15052        3119|---------------
--                                               ||      15053-16133        2292|-----------
--                                               ||      16134-17214        1649|--------
--                                               ||      17215-18295        1227|------
--                                               ||      18296-19376         966|-----
--                                               ||      19377-20457         730|----
--                                               ||      20458-21538         544|---
--                                               ||      21539-22619         433|---
--                                               ||      22620-23700         306|--
--                                               ||      23701-24781         241|--
--                                               ||      24782-25862         188|-
--                                               ||      25863-26943         135|-
--                                               ||      26944-28024         111|-
--                                               ||      28025-29105          79|-
--                                               ||      29106-30186          80|-
--                                               ||      30187-31267          71|-
--                                               ||      31268-32348          40|-
--                                               ||      32349-33429          35|-
--                                               ||      33430-34510          26|-
--                                               ||      34511-35591          14|-
--                                               ||      35592-36672          18|-
--                                               ||      36673-37753           7|-
--                                               ||      37754-38834          12|-
--                                               ||      38835-39915           6|-
--                                               ||      39916-40996           8|-
--                                               ||      40997-42077           2|-
--                                               ||      42078-43158           6|-
--                                               ||      43159-44239           4|-
--                                               ||      44240-45320           1|-
--                                               ||      45321-46401           0|
--                                               ||      46402-47482           3|-
--                                               ||      47483-48563           0|
--                                               ||      48564-49644           1|-
--                                               ||      49645-50725           2|-
--                                               ||      50726-51806           0|
--                                               ||      51807-52887           1|-
--                                               ||      52888-53968           0|
--                                               ||      53969-55049           1|-
--
----------------------------------------
-- Starting command on Mon Apr 15 14:46:16 2024 with 1026.766 GB free disk space

    cd correction/0-mercounts
    ./meryl-configure.sh \
    > ./meryl-configure.err 2>&1

-- Finished on Mon Apr 15 14:46:17 2024 (one second) with 1026.766 GB free disk space
----------------------------------------
--  segments   memory batches
--  -------- -------- -------
--        01  1.81 GB       2
--        02  1.03 GB       2
--        04  0.52 GB       2
--        06  0.35 GB       2
--        08  0.26 GB       2
--
--  For 99455 reads with 800003237 bases, limit to 8 batches.
--  Will count kmers using 01 jobs, each using 3 GB and 4 threads.
--
-- Finished stage 'merylConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
--
-- Failed to submit compute jobs.  Delay 10 seconds and try again.

CRASH:
CRASH: canu 2.2
CRASH: Please panic, this is abnormal.
CRASH:
CRASH:   Failed to submit compute jobs.
CRASH:
CRASH: Failed at /home/iwilkie/miniconda3/envs/pacbio_assembly/bin/../lib/site_perl/canu/Execution.pm line 1259.
CRASH:  canu::Execution::submitOrRunParallelJob(3656, "meryl", "correction/0-mercounts", "meryl-count", 1) called at /home/iwilkie/miniconda3/envs/pacbio_assembly/bin/../lib/site_perl/canu/Meryl.pm line 847
CRASH:  canu::Meryl::merylCountCheck(3656, "cor") called at /home/iwilkie/miniconda3/envs/pacbio_assembly/bin/canu line 1076
CRASH:
CRASH: Last 50 lines of the relevant log file (correction/0-mercounts/meryl-count.jobSubmit-01.out):
CRASH:
CRASH: sbatch: error: Batch job submission failed: Time limit specification required, but not provided
CRASH:

$ cat 3656/correction/0-mercounts/meryl-count.jobSubmit-01.out
sbatch: error: Batch job submission failed: Time limit specification required, but not provided

I don't really understand why the job fails, it says that the time limit was not provided, but I'm running the job from an interactive SLURM session which has a time limit.

Does anybody have any ideas on how to troubleshoot this issue? I tried digging through previous issues but couldn't find any similar issues, maybe it's a server-side issue?

Thanks in advance, Isa

skoren commented 7 months ago

When you run Canu on a grid, it will submit jobs to the grid for you so, even though you launched it in a single node, it will fan out to others as it runs. It handles the resource allocation (mem/threads) for these but won't add time or other options your grid may require (e.g. project). You can specify these other parameters that your grid requires using the gridOptions parameter: https://canu.readthedocs.io/en/latest/faq.html#how-do-i-run-canu-on-my-slurm-sge-pbs-lsf-torque-system.

Since your genome is pretty small you can also just restrict canu to the single node you launched it on using useGrid=false.

iwilkie commented 7 months ago

Thanks for clarifying @skoren! I got it to work now :)