Usefulness of this plugin

mlarjim commented 8 months ago

Hi! I developed a snakemake pipeline and I would like to execute it in a slurm cluster (Snakemake version is v8.2.0, so I had to install the plugin). However, I fail to see the point of using the slurm plugin...Is it not just the same as running a sbatch script similar to this one?

#!/bin/bash

#SBATCH -N 1            
#SBATCH -c 52
#SBATCH --mem=200G
#SBATCH -t 0
#SBATCH --export=ALL   # exort all shell environment
#SBATCH --nodelist c27

source ~/.bashrc
module load libgl
conda activate snakemake
snakemake --cores 52 -p --snakefile pipeline.smk --use-conda --resources mem_mb=200000

In addition, the plugin is not able to limit the number of cores nor nodes, so if I run snakemake --executor slurm in a cluster whose resources are shared by other users and the pipeline uses a wildcard that is expanded 500 times, snakemake will try to run 500 jobs. Limitting the number of jobs is not the best option because some jobs require less resources than others and parallelization will not be improved.

In which cases slurm plugin can improve the performance of a snakemake pipeline? Thank you in advance

cmeesters commented 8 months ago

Hi,

I developed a snakemake pipeline and I would like to execute it in a slurm cluster

Great!

Is it not just the same as running a sbatch script similar to this one?

It is not. The idea is to have a workflow and to start it on the login- or head-node of your cluster, e.g. like

$ snakemake --executor slurm \ # select the executor 
> -j unlimited \                               # allow a certain number of concurrent jobs (or 'unlimited', like here)
> --configfile ./config/config.yml # use a specific workflow configuration
> --workflow-profile ./profile/      # your per-rule workflow configuration, which allows for heterogeneous resource usage goes here
> --directory <some path>         # to allow for distributed file system snakemake allows pointing to some other directory

Now, a number of jobs gets launched. And the cluster offers to launch many concurrent jobs - just what you presumably want.

Note, that

--directory implies using workflow.source_path() in your workflow for all files relative to your Snakefile, see the docs, #here.
the plugin will start jobs for all your rules unless otherwise specified with localrules.

Have a look at the profile section, too. With a configuration like

executor: slurm
software-deployment-method:
  - conda
latency-wait: 60
default-storage-provider: fs
shared-fs-usage:
  - persistence
  - software-deployment
  - sources
  - source-cache
local-storage-prefix: /local/work/$USER/snakemake-scratch # or whatever your cluster local node storage prefix is

You can avoid setting all these things on the CLI over and over. With the storage plugin (conda snakemake-storage-plugin-fs you should be avoiding I/O issues if random I/O is a thing for you. Note, that updates are planned, allowing for more flexibility in this regard.

Is your workflow already registered with the catalogue?

mlarjim commented 8 months ago

Thank so much for your answer!

Is it not just the same as running a sbatch script similar to this one?

It is not. The idea is to have a workflow and to start it on the login- or head-node of your cluster, e.g. like

But the sbatch script can also be started on the login node:

sbatch -J name_job -o log -e log snakemake.sbatch

Now, a number of jobs gets launched. And the cluster offers to launch many concurrent jobs - just what you presumably want.

It is not exactly my purpose. I would like to launch the jobs in a single node of my slurm cluster. For instance, if a node has 52 cores and a rule needs 16 threads, this rule can be executed for three samples/wildcards at the same time (16 *3 = 48 cores are needed). However, if the next rule to be run needs 52 cores, the slurm plugin will launch three jobs, every job allocated in a different node. I should not use so many nodes.

I find the software-deployment-method argument very useful, thank you.

The pipeline is not ready yet. Sorry because I still don't understand everything about snakemake and the plugin. Maybe I should not use the plugin if I don't have a partition.

cmeesters commented 8 months ago

But the sbatch script can also be started on the login node:

You can, but it does not make any sense: snakemake will submit jobs when instructed by the executor. Using snakemake, you can orchestrate your jobs on the cluster.

If you want to pool jobs onto one node, that is what group jobs are for. Look for the group keyword.

I should not use so many nodes.

Why not? That is a rather senseless restriction: If your admins would like to compel you to limit the number of nodes, they can do so, by configuring their partitions accordingly.

BTW 16 is not an even divisor of 52. 13 is.

The pipeline is not ready yet.

So more the reason to go public. Don't make the mistake of writing a workflow (not to be confused with a sequential pipeline) and then let others re-invent what you did. You could team up, instead.

mlarjim commented 8 months ago

Dear Christian Meesters,

I will make my pipeline public when I understand the usefulness of the snakemake-executor-plugin-slurm. So, running snakemake using the slurm plugin on a login node orchestrates my jobs on the cluster. And running the following sbatch script (which launches snakemake without any plugin) on a login node orchestrates my jobs on the cluster, with the advantages of limiting the number of cores and nodes and no need to make a new partition of the cluster.


#!/bin/bash

#SBATCH -N 1            
#SBATCH -c 52
#SBATCH --mem=200G
#SBATCH -t 0
#SBATCH --export=ALL  
#SBATCH --nodelist nodeName

source ~/.bashrc
conda activate snakemake
snakemake --cores 52 -p --snakefile pipeline.smk --software-deployment-method conda --resources mem_mb=200000

Why not? That is a rather senseless restriction: If your admins would like to compel you to limit the number of nodes, they can do so, by configuring their partitions accordingly.

That's right but it is not the point of this issue

BTW 16 is not an even divisor of 52. 13 is.

Absolutely irrelevant to my example

cmeesters commented 7 months ago

Well, I hope the updated documentation gives a few hints as to the purpose. Slightly rephrased: The purpose of this plugin is to enable processing a vast amount of data on an HPC cluster, whilst giving the output and hints from the Snakemake master instance. Basically, the similar to launching Snakemake on a server and requesting jobs to run in the cloud with the cloud executors.

snakemake / snakemake-executor-plugin-slurm

Usefulness of this plugin #30