marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
656 stars 179 forks source link

sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits) #1218

Closed ogrecio closed 5 years ago

ogrecio commented 5 years ago

Hi!, I am using Canu 1.8 in a Linux RedHat system with the following command:

canu -p ONTassembly -d $WDR -fast -minReadLength=1500 -stopOnLowCoverage=0.010 -batMemory=80 -merylMemory=80 -maxMemory=80 -maxThreads=8 -genomeSize=100m gridOptions="--time=20:00:00 --partition=thinnodes --mem=80GB" -nanopore-raw AllMinION_cleaned.fastq

I am getting the following error:

CRASH: Canu 1.8
CRASH: Please panic, this is abnormal.
ABORT:
CRASH:   Failed to submit batch jobs.
CRASH:
CRASH: Failed at /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Compiler/gcc/6.4.0/canu/1.8/bin/../lib/site_perl/canu/Execution.pm line 1233.
CRASH:  canu::Execution::submitOrRunParallelJob("ONTassembly", "cormhap", "correction/1-overlapper", "mhap", 1, 2, 3, 4, ...) called at /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Compiler/gcc/6.4.0/canu/1.8/bin/../lib/site_perl/canu/OverlapMhap.pm line 774
CRASH:  canu::OverlapMhap::mhapCheck("ONTassembly", "cor", "partial") called at /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Compiler/gcc/6.4.0/canu/1.8/bin/canu line 596
CRASH:  main::overlap("ONTassembly", "cor") called at /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Compiler/gcc/6.4.0/canu/1.8/bin/canu line 783
CRASH: 
CRASH: Last 50 lines of the relevant log file (correction/1-overlapper/mhap.jobSubmit-01.out):
CRASH:
CRASH: sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
CRASH:

I would appreciate some help with this. Thank you.

This is the whole canu.out file is:

Found perl:
   /opt/cesga/easybuild-cesga/software/Compiler/gcccore/6.4.0/perl/5.28.0/bin/perl
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
   This is perl 5, version 28, subversion 0 (v5.28.0) built for x86_64-linux-thread-multi

Found java:
   /opt/cesga/easybuild-cesga/software/Core/jdk/8u181/bin/java
   java version "1.8.0_181"

Found canu:
   /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Compiler/gcc/6.4.0/canu/1.8/bin/canu
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
   Canu 1.8

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
-- Canu 1.8
--
-- CITATIONS
--
-- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM.
-- Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.
-- Genome Res. 2017 May;27(5):722-736.
-- http://doi.org/10.1101/gr.215087.116
-- 
-- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM.
-- De novo assembly of haplotype-resolved genomes with trio binning.
-- Nat Biotechnol. 2018
-- https//doi.org/10.1038/nbt.4277
-- 
-- Read and contig alignments during correction, consensus and GFA building use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
-- Corrected read consensus sequences are generated using an algorithm derived from FALCON-sense:
--   Chin CS, et al.
--   Phased diploid genome assembly with single-molecule real-time sequencing.
--   Nat Methods. 2016 Dec;13(12):1050-1054.
--   http://doi.org/10.1038/nmeth.4035
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_181' (from '/opt/cesga/easybuild-cesga/software/Core/jdk/8u181/bin/java') with -d64 support.
--
-- WARNING:
-- WARNING:  Failed to run gnuplot using command 'gnuplot'.
-- WARNING:  Plots will be disabled.
-- WARNING:
--
-- Detected 24 CPUs and 126 gigabytes of memory.
-- Limited to 80 gigabytes from maxMemory option.
-- Limited to 8 CPUs from maxThreads option.
-- Detected Slurm with 'sinfo' binary in /bin/sinfo.
-- Detected Slurm with 'MaxArraySize' limited to 10000 jobs.
-- 
-- Found   1 host  with 128 cores and 3906 GB memory under Slurm control.
-- Found 321 hosts with  24 cores and  120 GB memory under Slurm control.
-- Found   7 hosts with  24 cores and   60 GB memory under Slurm control.
-- Found  10 hosts with   4 cores and   30 GB memory under Slurm control.
-- Found   7 hosts with  20 cores and  120 GB memory under Slurm control.
-- Found  52 hosts with  20 cores and   60 GB memory under Slurm control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl     80 GB    4 CPUs  (k-mer counting)
-- Grid:  hap        8 GB    4 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap   10 GB    4 CPUs  (overlap detection with mhap)
-- Grid:  obtmhap   10 GB    4 CPUs  (overlap detection with mhap)
-- Grid:  utgmhap   10 GB    4 CPUs  (overlap detection with mhap)
-- Grid:  ovb        4 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8 GB    1 CPU   (overlap store sorting)
-- Grid:  red       10 GB    4 CPUs  (read error detection)
-- Grid:  oea        4 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       80 GB    8 CPUs  (contig construction with bogart)
-- Grid:  gfa        8 GB    8 CPUs  (GFA alignment and processing)
--
-- In 'ONTassembly.seqStore', found Nanopore reads:
--   Raw:        1492579
--   Corrected:  0
--   Trimmed:    0
--
-- Generating assembly 'ONTassembly' in '/mnt/lustre/scratch/home/otras/ini/ogr/Metagenomics/HybridAssembly'
--
-- Parameters:
--
--  genomeSize        50000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1200 ( 12.00%)
--    utgOvlErrorRate 0.1200 ( 12.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1200 ( 12.00%)
--    utgErrorRate    0.1200 ( 12.00%)
--    cnsErrorRate    0.2000 ( 20.00%)
--
--
-- BEGIN CORRECTION
--
--
-- Running jobs.  First attempt out of 2.
--

CRASH:
CRASH: Canu 1.8
CRASH: Please panic, this is abnormal.
ABORT:
CRASH:   Failed to submit batch jobs.
CRASH:
CRASH: Failed at /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Compiler/gcc/6.4.0/canu/1.8/bin/../lib/site_perl/canu/Execution.pm line 1233.
CRASH:  canu::Execution::submitOrRunParallelJob("ONTassembly", "cormhap", "correction/1-overlapper", "mhap", 1, 2, 3, 4, ...) called at /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Compiler/gcc/6.4.0/canu/1.8/bin/../lib/site_perl/canu/OverlapMhap.pm line 774
CRASH:  canu::OverlapMhap::mhapCheck("ONTassembly", "cor", "partial") called at /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Compiler/gcc/6.4.0/canu/1.8/bin/canu line 596
CRASH:  main::overlap("ONTassembly", "cor") called at /mnt/netapp1/Optcesga_FT2_RHEL7/easybuild-cesga/software/Compiler/gcc/6.4.0/canu/1.8/bin/canu line 783
CRASH: 
CRASH: Last 50 lines of the relevant log file (correction/1-overlapper/mhap.jobSubmit-01.out):
CRASH:
CRASH: sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
CRASH:
skoren commented 5 years ago

Looks like your grid is forbidding the jobs from being submitted, perhaps you need to specify some additional flags to allow array jobs? The correction/1-overlapper/mhap.jobSubmit-01.sh file will have the exact command which is failing, post that.

Have you run any other jobs in this folder before, the canu output file says the genome is 50mb but your command says 100mb.

ogrecio commented 5 years ago

Thank you @skoren . It happened that my grid didn't allow to submit more than 100 simultaneous jobs. Is there any way to limit the jobs to 100?

The order correction/1-overlapper/mhap.jobSubmit-01.sh was submitting was:

sbatch \
  --mem-per-cpu=2560m --cpus-per-task=4 --time=30:00:00 --partition=thinnodes --mem=32GB -o mhap.%A_%a.out \
  -D `pwd` -J "cormhap_ONTassembly" \
  -a 1-137 \
  ./mhap.sh 0 \
> ./mhap.jobSubmit-01.out 2>&1

so I submitted the tasks in two batches: one with option -a 1-100 and another one with option -a 101-137.

Then I run the canu command again but same problem happened with the command correction/2-correction/correctReads.jobSubmit-01.sh:

sbatch \
  --mem-per-cpu=1g --cpus-per-task=4 --time=30:00:00 --partition=thinnodes --mem=32GB -o correctReads.%A_%a.out \
  -D `pwd` -J "cor_ONTassembly" \
  -a 1-150 \
  ./correctReads.sh 0 \
> ./correctReads.jobSubmit-01.out 2>&1

Is there any way to limit the jobs to 100?

Cheers!

brianwalenz commented 5 years ago

Setting canu option gridEngineArrayMaxJobs=100 should limit to 100 tasks per job.

Your grid is claiming it will allow up to 10,000 tasks (in the canu log, "Detected Slurm with 'MaxArraySize' limited to 10000 jobs", which comes from scontrol show config). gridEngineArrayMaxJobs overrides this.