Open ArthurDondi opened 2 months ago
I'm not an author of this plugin but I might be able to help. I think cores: 400
underneath default-resources:
isn't doing what you think it's doing. cores
isn't the name of one of the Snakemake standard resources or any of the resources recognized by this plugin. Instead, probably you want to move cores: 400
to the global scope of the profile, such as under jobs: 500
. There it will set a global maximum number of cores to request at any given time from SLURM (behaving as the Snakemake CLI flag --cores
).
However this brings up the annoyance that led me to search this issues page in the first place. If --cores
is left unspecified, and even if --jobs
is set to a very high number or unlimited
, it defaults to the number of available cores on the head node. If you run nproc
on your head node, I bet you will get 4; that's where this seemingly arbitrary number is coming from.
I suppose this behavior is in line with a very close reading of the Snakemake CLI documentation, however it was not the behavior prior to Snakemake v8.0.0 and --slurm
being spun off into a plugin. Previously you could specify --jobs
without --cores
and there would be no restriction on total number of cores requested across the cluster. I'm not sure how to get this behavior back, and whether I should raise it as an issue either here or on the main Snakemake repository; either way the maintainers seem overwhelmed at the moment.
Spot on, thanks a lot! It was the available cores on the head node, and moving cores outside of default-resources:
did the trick.
I also saw that the maintainers were overwhelmed, I'll leave this open in case other people also stumble upon this, but feel free to close it.
Are you sure that cores
sets the global maximum number of cores at any given time? For instance, if I have this Snakefile
CHRS = [ "chr{}".format(x) for x in list(range(1,23)) ]
rule all:
input:
expand("resources/test_{chr}.txt", chr = CHRS)
rule TestSLURM:
output:
"resources/test_{chr}.txt"
threads:
64
shell:
"""
echo ${{SLURM_CPUS_PER_TASK}} > {output}
sleep 60
"""
with this profile
default-resources:
slurm_partition: "nodes"
mem_mb: 4000
runtime: 60
cores: 256
restart-times: 0
max-jobs-per-second: 1
max-status-checks-per-second: 1
local-cores: 1
latency-wait: 5
jobs: 1000000
keep-going: True
rerun-incomplete: True
printshellcmds: True
scheduler: greedy
executor: slurm
it saturates all available nodes (more than the 256 cores). I would expect that at most four TestSLURM
jobs can run at the same time.
I actually do use the "threads", and it is only taken into account when setting (in my config.yaml):
set-threads:
salmon: 16
When I use set-resources as below:
set-resources:
salmon:
threads: 16
mem_mb: 20000 # This seems to work, but can we do with less?
runtime: 600
threads is ignored and the default from the rule is used.
@freekvh this is intended behaviour according to the docs; scroll down a bit. It is not related to the issues mentioned in this thread. The reason behind this redundant definition is, that the threads
parameter can be picked up in the set-resources
section to dynamically alter other settings.
Everyone, please open separate issues, when dealing with separate issues. ;-)
Software Versions
snakemake 8.18.2 snakemake-executor-plugin-slurm 0.10.0 snakemake-executor-plugin-slurm-jobstep 0.2.1 slurm 23.02.7
Describe the bug The number of threads specified (64) for the rule
BaseCellCounter_scDNACalling
is not respected. Instead, 4 threads are provided, always, even if I require 1 thread. No idea why 4 in particular. It is similar to #141, but I'm not submitting through bash but through the head node, so I opened a new issue. I tried with threads, cpus-per-task, both, inside the rule, in the profile... Nothing works.Here is my profile, and you can find the logs below:
Logs
Minimal example
will give:
Additional context Not related, but any idea why the tmpdir is \<TBD> although I specify it in the profile?