marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
654 stars 179 forks source link

canu with slurm #1103

Closed potant closed 6 years ago

potant commented 6 years ago

Hello, in my slurm config there is a partition with name "batch" containing 11 nodes with 20 cores/128G per node

i use the command canu -p seabream useGrid=true gridOptions="-p batch" -d seabream_run1 genomeSize=800m -nanopore-raw data1.fa

The output of canu:

-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_171' (from 'java').
-- Detected gnuplot version '5.0 patchlevel 5' (from 'gnuplot') and image format 'png'.
-- Detected 12 CPUs and 126 gigabytes of memory.
-- Detected Slurm with 'sinfo' binary in /usr/bin/sinfo.
-- Detected Slurm with 'MaxArraySize' limited to 1000 jobs.
--
-**- Found  11 hosts with  20 cores and  125 GB memory under Slurm control.**
-- Found   1 host  with  12 cores and  125 GB memory under Slurm control.
-- Found   1 host  with  32 cores and  629 GB memory under Slurm control.
--
--                     (tag)Threads
--            (tag)Memory         |
--        (tag)         |         |  algorithm
--        -------  ------  --------  -----------------------------
-- Grid:  meryl     25 GB    4 CPUs  (k-mer counting)
-- Grid:  cormhap   25 GB    4 CPUs  (overlap detection with mhap)
-- Grid:  obtovl    12 GB    4 CPUs  (overlap detection)
-- Grid:  utgovl    12 GB    4 CPUs  (overlap detection)
-- Grid:  ovb        4 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs       16 GB    1 CPU   (overlap store sorting)
-- Grid:  red        8 GB    4 CPUs  (read error detection)
-- Grid:  oea        2 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat       62 GB   10 CPUs  (contig construction)
-- Grid:  cns       25 GB    4 CPUs  (consensus)
-- Grid:  gfa       16 GB    4 CPUs  (GFA alignment and processing)
--
-- Found Nanopore uncorrected reads in 'correction/seabream.gkpStore'.
--
-- Generating assembly 'seabream' in '/home1/tereza/AquaExcel/seabream/MinION_Run1/seabream_run1/seabream_run1'
--
-- Parameters:
--
--  genomeSize        800000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.3200 ( 32.00%)
--    obtOvlErrorRate 0.1440 ( 14.40%)
--    utgOvlErrorRate 0.1440 ( 14.40%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.5000 ( 50.00%)
--    obtErrorRate    0.1440 ( 14.40%)
--    utgErrorRate    0.1440 ( 14.40%)
--    cnsErrorRate    0.1920 ( 19.20%)
----------------------------------------
-- Starting command on Fri Sep 28 12:04:00 2018 with 41184.539 GB free disk space

    cd /home1/tereza/AquaExcel/seabream/MinION_Run1/seabream_run1/seabream_run1
    sbatch \
      --mem-per-cpu=4g \
      --cpus-per-task=1 \
      -p batch  \
      -D `pwd` \
      -J 'canu_seabream' \
      -o canu-scripts/canu.02.out canu-scripts/canu.02.sh
Submitted batch job 11857
----------------------------------------------------------------------------------------------------------------

The job is allocated to one node (the first of the batch partition and it seems that is working). But how could i use more resourses of my cluster? I mean to spread the job in more nodes (for example at totally 5 nodes of the 11 available) so as it can faster

Thanx

skoren commented 6 years ago

This is the correct behavior. Not all steps use multiple nodes/cores. Canu will submit the jobs appropriately as needed.