Canu error: CRASH: Failed to submit compute jobs. CRASH: qsub: Failed to route job to queue interactive

ShiriAr commented 6 months ago

Hi, I'm getting an error in canu saying "failed to route job to queue interactive." I'm on a Mac and submitting the job to a cluster with an interactive queue. I'm asking for: nodes=1:ppn=8, mem=200gb. I also tried with mem=50 and 100gb and with ppn=20.

canu command:

canu -p neuroPAL -d my_directory useGrid=true  genomeSize=2m -nanopore output_file

canu error:

canu snapshot v2.3-development +13 changes (r10284 08b0347f8655ed977a2ac75d35be198601bd7b3d)

Detected Java(TM) Runtime Environment '9.0.1' (from 'java') with -d64 support.
Detected gnuplot version '4.6 patchlevel 4   ' (from 'gnuplot') and image format 'png'.

Detected 8 CPUs and 376 gigabytes of memory on the local machine.

Detected PBSPro '20.0.0' with 'pbsnodes' binary in /opt/pbs/bin/pbsnodes.
Detecting PBSPro resources.

PBSPro support detected.  Resources available:
19 hosts with  96 cores and  376 GB memory.
1 host  with 128 cores and  503 GB memory.
1 host  with  16 cores and  376 GB memory.
1 host  with 128 cores and   62 GB memory.
1 host  with  80 cores and  377 GB memory.
1 host  with  24 cores and  125 GB memory.
1 host  with  48 cores and  377 GB memory.
3 hosts with  36 cores and   92 GB memory.
2 hosts with 256 cores and 1007 GB memory.
10 hosts with  24 cores and  188 GB memory.
1 host  with 128 cores and 2003 GB memory.
2 hosts with  96 cores and 1007 GB memory.
1 host  with 128 cores and 1007 GB memory.
8 hosts with  96 cores and  754 GB memory.
1 host  with  40 cores and   62 GB memory.
1 host  with  16 cores and  251 GB memory.
1 host  with 256 cores and   62 GB memory.
4 hosts with  48 cores and  187 GB memory.
5 hosts with 128 cores and  376 GB memory.
10 hosts with  96 cores and  377 GB memory.
4 hosts with  96 cores and  251 GB memory.
5 hosts with  16 cores and   62 GB memory.
3 hosts with  64 cores and  251 GB memory.
2 hosts with  96 cores and  500 GB memory.
1 host  with  16 cores and  187 GB memory.
1 host  with 100 cores and 1448 GB memory.
1 host  with  12 cores and  188 GB memory.
1 host  with  64 cores and  376 GB memory.
1 host  with 256 cores and  376 GB memory.
1 host  with   8 cores and    7 GB memory.
1 host  with 256 cores and  754 GB memory.
1 host  with  48 cores and 1007 GB memory.
13 hosts with 128 cores and  500 GB memory.
1 host  with  36 cores and   62 GB memory.
2 hosts with  16 cores and   92 GB memory.
4 hosts with  36 cores and  376 GB memory.
1 host  with   8 cores and  187 GB memory.
1 host  with  16 cores and  500 GB memory.
1 host  with   6 cores and   31 GB memory.

                       (tag)Threads
              (tag)Memory         |
      (tag)             |         |  algorithm
       -------  ----------  --------  -----------------------------
Grid:  meryl      4.000 GB    4 CPUs  (k-mer counting)
Grid:  hap        4.000 GB    4 CPUs  (read-to-haplotype assignment)
Grid:  cormhap    4.000 GB   16 CPUs  (overlap detection with mhap)
Grid:  obtovl     4.000 GB    8 CPUs  (overlap detection)
Grid:  utgovl     4.000 GB    8 CPUs  (overlap detection)
Grid:  cor        -.--- GB    4 CPUs  (read correction)
Grid:  ovb        4.000 GB    1 CPU   (overlap store bucketizer)
Grid:  ovs        8.000 GB    1 CPU   (overlap store sorting)
Grid:  red        8.000 GB    4 CPUs  (read error detection)
Grid:  oea        8.000 GB    1 CPU   (overlap error adjustment)
Grid:  bat       16.000 GB    4 CPUs  (contig construction with bogart)
Grid:  cns        -.--- GB    4 CPUs  (consensus)

Found Nanopore reads in 'neuroPAL.seqStore':
Libraries:
Nanopore:              1
Reads:
Raw:                   524887178

Generating assembly 'neuroPAL' in 'directory':
genomeSize:
10000000

Overlap Generation Limits:
corOvlErrorRate 0.3200 ( 32.00%)
obtOvlErrorRate 0.1200 ( 12.00%)
utgOvlErrorRate 0.1200 ( 12.00%)

Overlap Processing Limits:
corErrorRate    0.3000 ( 30.00%)
obtErrorRate    0.1200 ( 12.00%)
utgErrorRate    0.1200 ( 12.00%)
cnsErrorRate    0.2000 ( 20.00%)

Stages to run:
correct raw reads.
trim corrected reads.
assemble corrected and trimmed reads.

BEGIN CORRECTION

Running jobs.  First attempt out of 2.

Failed to submit compute jobs.  Delay 10 seconds and try again.

CRASH:
CRASH: canu snapshot v2.3-development +13 changes (r10284 08b0347f8655ed977a2ac75d35be198601bd7b3d)
CRASH: Please panic, this is abnormal.
CRASH:
CRASH:   Failed to submit compute jobs.
CRASH:
CRASH: Failed at /powerapps/share/canu-2.2/canu/build/bin/../lib/site_perl/canu/Execution.pm line 1259.
CRASH:  canu::Execution::submitOrRunParallelJob('neuroPAL', 'meryl', 'correction/0-mercounts', 'meryl-count', 1) called at /powerapps/share/canu-2.2/canu/build/bin/../lib/site_perl/canu/Meryl.pm line 847
CRASH:  canu::Meryl::merylCountCheck('neuroPAL', 'cor') called at /powerapps/share/canu-2.2/canu/build/bin/canu line 1076
CRASH: 
CRASH: Last 50 lines of the relevant log file (correction/0-mercounts/meryl-count.jobSubmit-01.out):
CRASH:
CRASH: qsub: Failed to route job to queue interactive
CRASH:

skoren commented 6 months ago

The error is coming from your cluster not canu. It's possible the nodes in the interactive queue are not allowed to submit further jobs, which is common in PBS (https://canu.readthedocs.io/en/latest/faq.html#my-run-stopped-with-the-error-failed-to-submit-batch-jobs). You can see the documented workaround there but the easiest is to run without using the grid by specifying useGrid=false in this case which will restrict canu to the node you've submitted to initially.

ShiriAr commented 6 months ago

Hi @skoren thanks for the quick response! I ran canu again with useGrid=false and got a new error, any thought on this one?

error:

BEGIN CORRECTION

Running jobs.  First attempt out of 2.

Starting 'meryl' concurrent execution on Fri May  3 18:08:20 2024 with 1117.697 GB free disk space (1 processes; 5 concurrently)

    cd correction/0-mercounts
    ./meryl-count.sh 1 > ./meryl-count.000001.out 2>&1

Finished on Fri May  3 18:08:21 2024 (one second) with 1117.697 GB free disk space

Kmer counting (meryl-count) jobs failed, retry.
job neuroPAL.01.meryl FAILED.

Running jobs.  Second attempt out of 2.

Starting 'meryl' concurrent execution on Fri May  3 18:08:21 2024 with 1117.697 GB free disk space (1 processes; 5 concurrently)

    cd correction/0-mercounts
    ./meryl-count.sh 1 > ./meryl-count.000001.out 2>&1

Finished on Fri May  3 18:08:21 2024 (lickety-split) with 1117.697 GB free disk space

Kmer counting (meryl-count) jobs failed, tried 2 times, giving up.
job neuroPAL.01.meryl FAILED.

ABORT:
ABORT: canu snapshot v2.3-development +13 changes (r10284 08b0347f8655ed977a2ac75d35be198601bd7b3d)
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.

skoren commented 6 months ago

I'd guess there's some issue with the installation. What's the contents of meryl-count.000001.out

ShiriAr commented 6 months ago

I'm not sure what you mean, I have a folder called "neuroPAL.01.meryl.WORKING" and it's empty

skoren commented 6 months ago

The log from the command should be there as well, in the 0-mercount folder. You can post both the shell script and the out files: ./meryl-count.sh and ./meryl-count.000001.out

ShiriAr commented 6 months ago

Oh I see! Here are both:

./meryl-count.sh

#!/bin/sh

#  Path to Canu.

bin="/powerapps/share/canu-2.2/canu/build/bin"

#  Report paths.

echo ""
echo "Found perl:"
echo "  " `which perl`
echo "  " `perl --version | grep version`
echo ""
echo "Found java:"
echo "  " `which java`
echo "  " `java -showversion 2>&1 | head -n 1`
echo ""
echo "Found canu:"
echo "  " $bin/canu
echo "  " `$bin/canu -version`
echo ""

#  Environment for any object storage.

export CANU_OBJECT_STORE_CLIENT=
export CANU_OBJECT_STORE_CLIENT_UA=
export CANU_OBJECT_STORE_CLIENT_DA=
export CANU_OBJECT_STORE_NAMESPACE=
export CANU_OBJECT_STORE_PROJECT=

if [ z$PBS_O_WORKDIR != z ] ; then
  cd $PBS_O_WORKDIR
fi

#  Discover the job ID to run, from either a grid environment variable and a
#  command line offset, or directly from the command line.
#
if [ x$PBS_ARRAY_INDEX = x -o x$PBS_ARRAY_INDEX = xundefined -o x$PBS_ARRAY_INDEX = x0 ]; then
  baseid=$1
  offset=0
else
  baseid=$PBS_ARRAY_INDEX
  offset=$1
fi
if [ x$offset = x ]; then
  offset=0
fi
if [ x$baseid = x ]; then
  echo Error: I need PBS_ARRAY_INDEX set, or a job index on the command line.
  exit
fi
jobid=`expr -- $baseid + $offset`
if [ x$baseid = x0 ]; then
  echo Error: jobid 0 is invalid\; I need PBS_ARRAY_INDEX set, or a job index on the command line.
  exit
fi
if [ x$PBS_ARRAY_INDEX = x ]; then
  echo Running job $jobid based on command line options.
else
  echo Running job $jobid based on PBS_ARRAY_INDEX=$PBS_ARRAY_INDEX and offset=$offset.
fi

if [ $jobid -gt 01 ]; then
  echo Error: Only 01 jobs, you asked for $jobid.
  exit 1
fi

jobid=`printf %02d $jobid`

#  If the meryl database exists, we're done.

if [ -e ./neuroPAL.$jobid.meryl/merylIndex ] ; then
  echo Kmers for batch $jobid exist.
  exit 0
fi

#  If the meryl output exists in the object store, we're also done.

if [ -e neuroPAL.$jobid.meryl.tar.gz ]; then
  exist1=true
else
  exist1=false
fi
if [ $exist1 = true ] ; then
  echo Kmers for batch $jobid exist in the object store.
  exit 0
fi

#  Nope, not done.  Fetch the sequence store.

#  And compute.

/powerapps/share/canu-2.2/canu/build/bin/meryl k=16 threads=4 memory=3 \
  count \
    segment=$jobid/01 ../../neuroPAL.seqStore \
    output ./neuroPAL.$jobid.meryl.WORKING \
&& \
mv -f ./neuroPAL.$jobid.meryl.WORKING ./neuroPAL.$jobid.meryl

exit 0

./meryl-count.000001.out

Found perl:
   /usr/bin/perl
   This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java:
   /usr/bin/java
   java version "9.0.1"

Found canu:
   /powerapps/share/canu-2.2/canu/build/bin/canu
   canu snapshot v2.3-development +13 changes (r10284 08b0347f8655ed977a2ac75d35be198601bd7b3d)

Running job 1 based on command line options.
usage: /powerapps/share/canu-2.2/canu/build/bin/meryl ...

  A meryl command line is formed as a series of commands and files, possibly
  grouped using square brackets.  Each command operates on the file(s) that
  are listed after it.

  COMMANDS:

    statistics           display total, unique, distnict, present number of the kmers on the screen.  accepts exactly one input.
    histogram            display kmer frequency on the screen as 'frequency<tab>count'.  accepts exactly one input.
    print                display kmers on the screen as 'kmer<tab>count'.  accepts exactly one input.

    count                Count the occurrences of canonical kmers in the input.  must have 'output' specified.
    count-forward        Count the occurrences of forward kmers in the input.  must have 'output' specified.
    count-reverse        Count the occurrences of reverse kmers in the input.  must have 'output' specified.
      k=<K>              create mers of size K bases (mandatory).
      n=<N>              expect N mers in the input (optional; for precise memory sizing).
      memory=M           use no more than (about) M GB memory.
      threads=T          use no more than T threads.
      compress           compress homopolymer runs to a single letter.

    less-than N          return kmers that occur fewer than N times in the input.  accepts exactly one input.
    greater-than N       return kmers that occur more than N times in the input.  accepts exactly one input.
    equal-to N           return kmers that occur exactly N times in the input.  accepts exactly one input.
    not-equal-to N       return kmers that do not occur exactly N times in the input.  accepts exactly one input.

    increase X           add X to the count of each kmer.
    decrease X           subtract X from the count of each kmer.
    multiply X           multiply the count of each kmer by X.
    divide X             divide the count of each kmer by X.
    divide-round X       divide the count of each kmer by X and round results. count < X will become 1.
    modulo X             set the count of each kmer to the remainder of the count divided by X.

    union                return kmers that occur in any input, set the count to the number of inputs with this kmer.
    union-min            return kmers that occur in any input, set the count to the minimum count
    union-max            return kmers that occur in any input, set the count to the maximum count
    union-sum            return kmers that occur in any input, set the count to the sum of the counts

    intersect            return kmers that occur in all inputs, set the count to the count in the first input.
    intersect-min        return kmers that occur in all inputs, set the count to the minimum count.
    intersect-max        return kmers that occur in all inputs, set the count to the maximum count.
    intersect-sum        return kmers that occur in all inputs, set the count to the sum of the counts.

    subtract             return kmers that occur in the first input, subtracting counts from the other inputs

    difference           return kmers that occur in the first input, but none of the other inputs
    symmetric-difference return kmers that occur in exactly one input

  MODIFIERS:

    output O             write kmers generated by the present command to an output  meryl database O
                         mandatory for count operations.

  EXAMPLES:

  Example:  Report 22-mers present in at least one of input1.fasta and input2.fasta.
            Kmers from each input are saved in meryl databases 'input1' and 'input2',
            but the kmers in the union are only reported to the screen.

            meryl print \
                    union \
                      [count k=22 input1.fasta output input1] \
                      [count k=22 input2.fasta output input2]

  Example:  Find the highest count of each kmer present in both files, save the kmers to
            database 'maxCount'.

            meryl intersect-max input1 input2 output maxCount

  Example:  Find unique kmers common to both files.  Brackets are necessary
            on the first 'equal-to' command to prevent the second 'equal-to' from
            being used as an input to the first 'equal-to'.

            meryl intersect [equal-to 1 input1] equal-to 1 input2

Can't interpret '../../neuroPAL.seqStore': not a meryl command, option, or recognized input file.

skoren commented 6 months ago

This looks like the initial store building step also failed, at least it is invalid which is why this job is failing. Make sure you erase any output of a previous run (the full my_directory), re-run, and post the full log from that re-run.

ShiriAr commented 6 months ago

I erased the previous output and it's been running for 19 minutes now, the longest it's run in a while. Maybe the output of previous runs was the problem. I'll let you know when it stops! Thanks so much!

ShiriAr commented 6 months ago

Just updating that the run finished successfully. Thanks @skoren for the help!!

marbl / canu

Canu error: CRASH: Failed to submit compute jobs. CRASH: qsub: Failed to route job to queue interactive #2312