marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
660 stars 179 forks source link

Failed to submit compute jobs. #2267

Closed beyondshodh closed 1 year ago

beyondshodh commented 1 year ago

Please help me reolving the error given below using CANU

-- Starting command on Fri Oct  6 12:09:50 2023 with 75713.293 GB free disk space

    cd correction/0-mercounts
    ./meryl-configure.sh \
    > ./meryl-configure.err 2>&1

-- Finished on Fri Oct  6 12:09:50 2023 (lickety-split) with 75713.293 GB free disk space
----------------------------------------
--  segments   memory batches
--  -------- -------- -------
--        01  0.14 GB       2
--
--  For 5846 reads with 53765743 bases, limit to 1 batch.
--  Will count kmers using 01 jobs, each using 2 GB and 4 threads.
--
-- Finished stage 'merylConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
--
-- Failed to submit compute jobs.  Delay 10 seconds and try again.

CRASH:
CRASH: canu 2.2
CRASH: Please panic, this is abnormal.
CRASH:
CRASH:   Failed to submit compute jobs.
CRASH:
CRASH: Failed at /ibdc-hpc/apps1/canu-2.2/build/bin/../lib/site_perl/canu/Execution.pm line 1259.
CRASH:  canu::Execution::submitOrRunParallelJob("my-assembly", "meryl", "correction/0-mercounts", "meryl-count", 1) called at /ibdc-hpc/apps1/canu-2.2/build/bin/
CRASH:  canu::Meryl::merylCountCheck("my-assembly", "cor") called at /ibdc-hpc/apps1/canu-2.2/build/bin/canu line 1076
CRASH:
CRASH: Last 50 lines of the relevant log file (correction/0-mercounts/meryl-count.jobSubmit-01.out):
CRASH:
CRASH: qsub: Bad UID for job execution
CRASH:
(gnuplot) [soniabalyan@brahm-login canu-scripts]$
skoren commented 1 year ago

Sounds like your compute nodes aren't allowed to submit jobs which is a requirement to run Canu on a grid (https://canu.readthedocs.io/en/latest/faq.html#my-run-stopped-with-the-error-failed-to-submit-batch-jobs). As discussed in previous issues (e.g. #1765) your only option would be to turn off grid use (useGrid=false) and run on a single compute node or use useGrid=remote to stop execution, manually submit jobs, wait for them to complete, and resume the run.