snakemake / snakemake-executor-plugin-slurm

A Snakemake executor plugin for submitting jobs to a SLURM cluster
MIT License
9 stars 13 forks source link

setting ntasks for non-MPI jobs #40

Closed tatumdmortimer closed 4 months ago

tatumdmortimer commented 4 months ago

I am working on transitioning my snakemake workflows to Snakemake v8. However, when I use this plugin with a profile, I get the following error message: WorkflowError: SLURM job submission failed. The error message was sbatch: error: You must specify a number of tasks. sbatch: error: Batch job submission failed: Job size specification needs to be provided

Here are the versions of Snakemake, this plugin, and slurm that I am working with:

Snakemake 8.5.2
snakemake-executor-plugin-slurm 0.3.2
slurm 23.02.4

Here is my profile:

executor: "slurm"
software-deployment-method: "conda"
latency-wait: 60
jobs: 100
default-resources:
  runtime: 10
  slurm_partition: "batch"
  mem_mb: 4000
  tasks: 1
  nodes: 1

I cloned the repository, removed the if job.resources.get("mpi", False): on line 107 of __init__.py, and reinstalled the plugin. This seems to have fixed the issue for me. Is there a reason these were only included in the sbatch command for mpi jobs?

Thanks!

cmeesters commented 4 months ago

Can you please run a minimal example with the --verbose flag? And paste or attach (the output is probably a mouthful) the output here? Edit: Oh, please include the command line.

Suggestion for a minimal Snakefile:

rule all:
     input: "results/a.out"

rule test1:
     output: "results/a.out"
     shell: "touch {output}"
tatumdmortimer commented 4 months ago

I've attached the output using the minimal snakefile you suggested. This is the command line that I used: snakemake --cores 1 --workflow-profile slurm_profile --verbose

snakemake_test.txt

cmeesters commented 4 months ago

Thank you. This is really weird, because the SLURM docs state, that:

The default is one task per node ...

Which means that no requirement is imposed to set -n/--ntasks explicitly.

It will not hurt to implement submitting with a default 1 for the tasks and not requiring users with a situation like yours to patch the plugin. But I wonder: Are you in contact with your admins and can tell us why this cluster deviates from the SLURM defaults?

cmeesters commented 4 months ago

@tatumdmortimer please update to the latest release and give it a try.

tatumdmortimer commented 4 months ago

I updated to the latest release, and the minimal Snakefile completed successfully. Thanks for fixing this so quickly.

I did get in touch with the research computing at my university, and while they did confirm that cluster deviates from the defaults, they didn't provide me with a reason why.

cmeesters commented 4 months ago

Well, then I am just glad it's working for you (and hope for anybody else, too).

FYI: SchedMD's (the company behind SLURM) "ecosystem" shows some odd flowers, mainly due to service policies. I am merely trying to get information about the how and why of deviations from the standard to improve and stabilize these plugins. And sometimes the answer is "just because". Anyway, thanks for asking!