Closed imsarath closed 3 months ago
From the Snakemake docs:
Immediately submit all jobs to the cluster instead of waiting for present input files. This will fail, unless you make the cluster aware of job dependencies, e.g. via: $ snakemake –cluster ‘sbatch –dependency {dependencies}. Assuming that your submit script (here sbatch) outputs the generated job id to the first stdout line, {dependencies} will be filled with space separated job ids this job depends on. Does not work for workflows that contain checkpoint rules.
In other words, this won't work unless you manage the dependencies yourself via Slurm (example). Personally, that seems like a lot of extra work/complexity when Snakemake already handles this for you.
What is your motivation for using --immediate-submit
?
In our research team, we developed our pipelines using this feature --immediate-submit
in snakemake (6.2.1). Now we are trying to update the snakemake (>=8).
In our research team, we developed our pipelines using this feature --immediate-submit in snakemake (6.2.1)
Were you using this smk-simple-slurm profile for this previous --immediate-submit
setup with Snakemake 6.2.1?
Now we are trying to update the snakemake (>=8).
Have you made all the various required updates to migrate to snakemake >8? The latest Snakemake 8 version of this profile and documentation was merged only recently on June 20th (b91a2284b1d2bbe8ec3f0bf2e157ab63c9024d13, #23)
And in general, a minimal reproducible example would be helpful.
No, we were using snakemake slurm profiles . We used python script from this repo.
Yes, I have updated all the required changes to migrate snakemake >8.
Here, I used simple shell script to add dependencies to the jobs submission.
profile/slurm/config.yaml
executor: cluster-generic
cluster-generic-submit-cmd:
mkdir -p logs/cluster/ &&
sbatch
--partition={resources.partition}
--cpus-per-task={threads}
--job-name=smk-{rule}-{wildcards}
--output=logs/cluster/smk.{rule}-{wildcards}-%j.out
--parsable
$(bash /path/to/parseJobID.sh {dependencies})
default-resources:
- partition=core
- mem_mb=4G
restart-times: 1
max-jobs-per-second: 10
max-status-checks-per-second: 1
local-cores: 1
latency-wait: 60
jobs: 500
keep-going: True
rerun-incomplete: True
printshellcmds: True
scheduler: greedy
use-singularity: True
jobscript: slurm_jobscript.sh
use-conda: False
software-deployment-method: apptainer
/path/to/parseJobID.sh
#!/bin/bash
# helper script that parses slurm output for the job ID,
# and feeds it to back to snakemake/slurm for dependencies.
# This is required when you want to use the snakemake --immediate-submit option
if [[ "Submitted batch job" =~ "$@" ]]; then
echo -n ""
else
deplist=$(grep -Eo '[0-9]{1,10}' <<< "$@" | tr '\n' ',' | sed 's/.$//')
echo -n "--dependency=afterok:$deplist"
fi;
Here, I used simple shell script to add dependencies to the jobs submission.
@imsarath Very cool! Thanks for sharing. I had never tried this approach before.
Unfortunately I no longer have convenient access to an HPC cluster with Slurm (I no longer consult for the client I developed this profile for initially), so I can't actively troubleshoot this. @JoshLoecker do you have the bandwidth to try parseJobID.sh
?
Hi @imsarath, I've never used snakemake like this and, like @jdblischak said, trying to manage dependencies this way will almost certainly be more trouble than it's worth. You're better off relying on snakemake to handle the input and output for you by removing $(bash /path/to/parseJobID.sh {dependencies})
from your config.yaml
file. The mock Snakefile below allows snakemake to handle dependencies. Depending on how your workflow is set up, it may need to be changed to this format once the dependencies are removed from the configuration.
rule all:
input: "output_file.txt"
rule a:
input: "sample_data.csv"
output: "rule_a_output.txt"
bash: "touch {output}"
rule b:
input: rules.a.output
output: "output_file.txt"
bash: "touch {output}"
I'll do my best to help no matter if you keep or remove the --immediate-submit
option :)
Can you post the following items?
Snakefile
for your workflowslurm_jobscript.sh
script. I'm not sure if this is actually used to submit jobs since cluster-generic-submit-cmd
is also defined, but I want to match your environment as closely as possibleHi @jdblischak @JoshLoecker, thanks for helping me. It looks like there's an issue with Snakemake, and I'm considering removing the --immediate-submit option from our pipeline.
I used your SLURM profile, and it works well. However, when I try to submit jobs to SLURM using the --immediate-submit option, Snakemake submits the initial batch of jobs but then immediately tries to check the output files, even though the jobs are still running in the SLURM queue. As a result, it throws errors about missing output files.
I don't understand why this is happening. Could you provide some guidance on how to resolve this issue?
snakemake: v8.12.0