Open ebete opened 9 months ago
looking into it. Can you provide a minimal example?
Just stumbled upon this too. Here's a MWE:
# Snakefile
rule all:
input: "a.dat", "b.dat"
rule a:
output: "{sample}.dat"
resources: runtime=50
group: "foo"
shell: "touch {output} && sleep 200"
# default/config.yaml
# works on the NERSC supercomputer
executor: slurm
default-resources:
- slurm_account=m2676
- constraint=cpu
- slurm_extra="--qos regular"
jobs: unlimited
cores: all
group-components:
- foo=2
execute with snakemake --profile default
. Output (the missing partition error is innocuous):
Using profile default for setting default command line arguments.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided remote nodes: 9223372036854775807
Job stats:
job count
----- -------
a 2
all 1
total 3
Select jobs to execute...
Execute 1 jobs...
[Thu Feb 1 06:42:52 2024]
group job foo (jobs in lexicogr. order):
[Thu Feb 1 06:42:52 2024]
rule a:
output: a.dat
jobid: 1
reason: Code has changed since last execution
wildcards: sample=a
resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, slurm_account=m2676, constraint=cpu, slurm_extra=--qos regular, runtime=50
[Thu Feb 1 06:42:52 2024]
rule a:
output: b.dat
jobid: 2
reason: Code has changed since last execution
wildcards: sample=b
resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, slurm_account=m2676, constraint=cpu, slurm_extra=--qos regular, runtime=50
No partition was given for rule 'JobGroup(foo,frozenset({a, a}))', and unable to find a default partition. Trying to submit without partition information. You may want to invoke snakemake with --default-resources 'slurm_partition=<your default partition>'.
WorkflowError:
SLURM job submission failed. The error message was sbatch: error: Script arguments not permitted with --wrap option
fixed in release 8.4.3 of the snakemake core - however, the odd propagation of function values instead of string flags is just fixed. A release is forthcoming. Stay tuned.
Unfortunately I still get the same error using snakemake 8.4.4
+ snakemake-executor-plugin-slurm 0.3.0
.
The problem for me at least is that --apptainer-args
is passed with mixed quotes:
See the sbatch
command (added some line breaks for clarity):
sbatch call:
sbatch
--job-name 851bba7b-9529-4d79-90e2-69cb693d8d46 --output $PROJ/.snakemake/slurm_logs/rule_qiime_import/%j.log --export=ALL --comment qiime_import -A slurm -p cpu -t 900 --mem 50000 --cpus-per-task=1 -D $PROJ
--wrap="
mambaforge/envs/snakemake/bin/python3.11 -m snakemake --snakefile '$PROJ/Snakefile' --target-jobs 'qiime_import:project=result' --allowed-rules 'qiime_import' --cores 'all' --attempt 1 --force-use-threads --resources 'mem_mb=50000' 'mem_mib=47684' 'disk_mb=2676' 'disk_mib=2553' 'runtime_max=86400' 'load=0' --wait-for-files-file '$PROJ/.snakemake/tmp.qhihjnxt/j339.qiime_import.snk.waitforfilesfile.txt' --force --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete
--verbose
--rerun-triggers input params software-env mtime code
--deployment-method conda apptainer
--conda-frontend 'mamba'
--conda-prefix '.snakemake-envs'
--conda-base-path 'mambaforge/envs/snakemake'
--apptainer-prefix '.snakemake-envs'
--apptainer-args "--bind '$one_path:$one_path:rw' --bind '$other_path:$other_path:rw'"
--shared-fs-usage software-deployment input-output persistence sources source-cache storage-local-copies
--wrapper-prefix 'https://github.com/snakemake/snakemake-wrappers/raw/'
--configfiles $PROJ/setup.yml
--printshellcmds
--latency-wait 60
--scheduler 'ilp'
--scheduler-solver-path 'mambaforge/envs/snakemake/bin'
--default-resources 'mem_mb=1000' 'disk_mb=max(2*input.size_mb, 1000)' 'tmpdir=system_tmpdir' 'slurm_account=slurm' 'slurm_partition=cpu' 'runtime=1440' 'runtime_max=86400' 'load=0'
--executor slurm-jobstep
--jobs 1
--mode 'remote'
"
.. SLURM job submission failed. The error message was sbatch: error: Script arguments not permitted with --wrap option
for now I worked around this by removing the quotes around the --bind '...'
, but that'll only hold up as long as there are no spaces in these paths.
I'm not sure what the best solution would be as you basically need both quotes around the --apptainer-args
arguments and the --bind
ones.
Escaping the quotes gave me a different set of problems.
If nothing else, it may be an idea to just put an error message here like:
if exec_job.contains('"'):
raise ValueError("SLURM submission command contains illegal character \"")
What a command line!
Currently, I am unable to test - no cluster (maintenance). Can you indicate the malicious line of your Snakemake command line or your resource yaml, please?
The MWE I posted earlier keeps failing with Snakemake 8.4.3, for the record.
@cmeesters A large part of that is generated by snakemake itself luckily.
The error stems from the --apptainer-args
argument with mixed quotations.
Anyway, a (minimal) reproducible example (requires singularity/apptainer + SLURM):
slurm_test.snk
# snakemake
# set up -------------------------------------------------------------------------------------------
from snakemake.utils import min_version
min_version("8.0.0")
localrules: all
container: "docker://condaforge/mambaforge:latest"
SAMPLE_NAMES = [f"file_{x:d}" for x in range(1,5)]
OUTPUT_DIR = "results_test"
# start --------------------------------------------------------------------------------------------
rule all:
message: "All target files have been generated/updated"
default_target: True
threads: 1
input:
expand("{project}/{sample}.txt", sample=SAMPLE_NAMES, project=OUTPUT_DIR)
rule gen_file:
output:
"{project}/{sample}.txt"
threads: 1
resources:
load = 100
shell:
'printf "now: %s\n" "$(date)" > {output:q}'
run.sh
#!/usr/bin/env bash
set -o nounset
set -o errexit
set -o pipefail
extra_params=${@:1}
# set this script path as the working directory
cd -P -- $(dirname "$0")
echo "Snakemake $(snakemake --version) | $(python -VV)" >&2
echo "[RUN_TEST] Running pipeline with test dataset ..." >&2
snakemake \
--snakefile "slurm_test.snk" \
--executor slurm \
--local-cores 1 \
--jobs 4 \
--rerun-incomplete \
--software-deployment-method apptainer \
--apptainer-prefix "$HOME/.snakemake-envs" \
--apptainer-args "--bind '$HOME/.snakemake-envs:$HOME/.snakemake-envs:rw'" \
--printshellcmds \
--restart-times 0 \
--show-failed-logs \
--resources 'load=100' \
--default-resources \
'load=0' \
'runtime=600' \
'mem_mb=1024' \
'slurm_account=slurm' \
'slurm_partition=cpu' \
--cores 1 \
--forceall \
$extra_params
Hi everyone,
It seems that the same issue occurs when a workflow-profile configuration file contains a config
keyword to store some configuration variables. Please find hereafter a MWE and a workaround for this specific problem.
Snakefile:
rule all:
output:
touch("all.done")
shell:
"wait 10"
profile/config.yaml that contains some configuration variables:
executor: "slurm"
default-resources:
slurm_account: "XXXXXXXXXXXX"
slurm_partition: "XXXX"
runtime: 1435
cpus_per_task: 1
nodes: 1
tasks: 1
mem_mb: 2000
jobs: 1
config:
importantlist:
- "item1"
- 'item2'
- item3
Calling snakemake with snakemake --workflow-profile profile --verbose
, we get:
rule all:
output: all.done
jobid: 0
reason: Missing output files: all.done
resources: mem_mb=2000, mem_mib=1908, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, slurm_account=XXXXXXX, slurm_partition=fast, runtime=1435, cpus_per_task=1, nodes=1, tasks=1
sbatch call: sbatch --job-name fee9209a-17a3-46d0-bab5-0765fb728317 --output /redacted/path/ --export=ALL --comment all -A XXXXX -p XXXXX -t 1435 --mem 2000 --cpus-per-task=1 -D /redacted/path/ --wrap="/redacted/path/bin/python3.12 -m snakemake --snakefile '/redacted/path/Snakefile' --target-jobs 'all:' --allowed-rules 'all' --cores 'all' --attempt 1 --force-use-threads --resources 'mem_mb=2000' 'mem_mib=1908' 'disk_mb=1000' 'disk_mib=954' 'cpus_per_task=1' 'nodes=1' 'tasks=1' --wait-for-files '/redacted/path/.snakemake/tmp.6vkh9cws' --force --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --verbose --rerun-triggers code software-env params input mtime --conda-frontend 'mamba' --shared-fs-usage software-deployment sources input-output storage-local-copies persistence source-cache --wrapper-prefix 'https://github.com/snakemake/snakemake-wrappers/raw/' --config "importantlist=['item1', 'item2', 'item3']" --latency-wait 5 --scheduler 'ilp' --scheduler-solver-path '/redacted/path/envs/snakemake-8.0.1/bin' --default-resources 'mem_mb=2000' 'disk_mb=max(2*input.size_mb, 1000)' 'tmpdir=system_tmpdir' 'slurm_account=XXXX' 'slurm_partition=XXXX' 'runtime=1435' 'cpus_per_task=1' 'nodes=1' 'tasks=1' --executor slurm-jobstep --jobs 1 --mode 'remote'"
unlocking
removing lock
removing lock
removed all locks
[...]
WorkflowError:
SLURM job submission failed. The error message was sbatch: error: Script arguments not permitted with --wrap option
Note that the config keyword was translated to --config "importantlist=['item1', 'item2', 'item3']"
, the un-escaped "
been likely to cause the WorkflowError since it is already used by --wrap="[...]"
.
As a workaround, I moved my config parameters to a config.yaml file:
importantlist:
- "item1"
- 'item2'
- item3
Running snakemake as snakemake --workflow-profile profile --configfile config.yaml --verbose
works as expected.
Tested with:
HTH, best,
Please update to the current version of the executor and Snakemake itself and try again. My own test is working, now.
Note, that you ought to write: slurm_extra="'--flag1=arg1 --flag2=arg2'"
@cmeesters Still the same exception.
WorkflowError:
SLURM job submission failed. The error message was sbatch: error: Script arguments not permitted with --wrap option
snakemake 8.4.11
snakemake-executor-plugin-slurm 0.3.2
python 3.11.8
Sorry, I could not get back to this issue any sooner.
@ebete With $ snakemake --workflow-profile ./profiles -j unlimited
(rest are my defaults) and
$ cat ./profiles/config.yaml
default-resources:
slurm_partition: "<partition>"
slurm_account: "<account>"
runtime: "10h"
apptainer_prefix: "$HOME/.snakemake-envs"
apptainer_args: "--bind '$HOME/.snakemake-enve:$HOME/.snakemake-envs:rw'"
set-resources:
gen_file:
load: 100
I could get your testcase to work. Before that, I think we went into Dante's fourth hell of quotation nesting.
@ebete is it working? Or do these issues persist?
@cmeesters As of now, the issue still persists. I do have a small patch that should solve it though:
diff --git a/snakemake_executor_plugin_slurm/__init__.py b/snakemake_executor_plugin_slurm/__init__.py
index a97f4aa..a214983 100644
--- a/snakemake_executor_plugin_slurm/__init__.py
+++ b/snakemake_executor_plugin_slurm/__init__.py
@@ -6,6 +6,7 @@ __license__ = "MIT"
import csv
from io import StringIO
import os
+import re
import subprocess
import time
from datetime import datetime, timedelta
@@ -141,6 +142,8 @@ class Executor(RemoteExecutor):
# (see https://github.com/snakemake/snakemake/issues/2014)
call += f" -D {self.workflow.workdir_init}"
# and finally the job to execute with all the snakemake parameters
+ # escape any unescaped double quotes to prevent the --wrap="..." to be prematurely closed
+ exec_job = re.sub(r'(?<!\\)"', r"\"", exec_job)
call += f' --wrap="{exec_job}"'
self.logger.debug(f"sbatch call: {call}")
Looks nice! Out of curiosity: Did it fail the same way as before using the workflow-profile?
Anyway, I do not want to leave you hanging. Yet, I just got a new student, a workshop for Wednesday to prepare and will take a few days off, because our school is closed. With other words: little time for testing. Will do so a.s.a.p.
I reckon that the profile would work. But in my case I'm generating the --bind
s dynamically in a wrapper script, hence a static yml
file would not suffice.
And no worries, enjoy the break :)
The
sbatch
command constructed insnakemake_executor_plugin_slurm.Executor().run_job()
does not seem to escape quotes in theexec_job
variable, causing issues at the following line:https://github.com/snakemake/snakemake-executor-plugin-slurm/blob/cf0560cfac101af9600d7a1f7e3e21f5485d47c4/snakemake_executor_plugin_slurm/__init__.py#L133
When
exec_job
contains a double quote ("
), it will generate a not-so-helpful error message:Extra info