snakemake / snakemake-executor-plugin-slurm

A Snakemake executor plugin for submitting jobs to a SLURM cluster
MIT License
14 stars 17 forks source link

Snakemake fails to parse sbatch output to get slurm_jobid #146

Open kennethabarr opened 1 week ago

kennethabarr commented 1 week ago

On my system slurm reports submission info even at the lowest level of verbosity. I.e.

> sbatch script.sh
sbatch: Verify job submission ...
sbatch: Using a shared partition ...
sbatch: Partition: X
sbatch: QOS-Flag: X
sbatch: Account: X
sbatch: Verification: ***PASSED***
Submitted batch job 23401073

This leads to problems parsing the jobid. This cannot be resolved with --quiet because then the job id is not returned either. The lines beginning with "sbatch: " are printed to stderr, while the jobid is printed to stdout, but in the code these are mixed on init.py:207

out = subprocess.check_output(
                call, shell=True, text=True, stderr=subprocess.STDOUT
            )

It is not an elegant solution, but I have modified my init.py by adding a line to subset the variable out to only the last line. This restores the proper behavior and my pipelines run normally.

out = out.splitlines()[len(out.splitlines())-1].strip()

The versions I am using are as follows: snakemake version 8.18.2 slurm executor version 0.10.0 slurm version 20.11.8

cmeesters commented 1 week ago

This is the problem when providing a piece of software which should ease the execution on HPC-clusters: The never-ending creativity of admins who tinker with the defaults.

Usually, nothing more than Submitted batch job 23401073 (and perhaps the cluster name in a multi-cluster setup) is reported. I confess: To anticipate your cluster's level of verbosity was beyond my level of imagination.

Your patch is not production ready. Do you want to submit a pull request? Or should I go forward? Please note: I am terribly busy these days. It is not likely that I get ready before October.

Also: In-job submission of snakemake jobs is currently not well-supported. I am working on improvements.

nigiord commented 1 week ago

@kennethabarr Could you share the output of sbatch --parsable […] on your system? Since v0.10.0 the plugin uses this option for robustly parsing the sbatch output. Since you have an error, I guess you still have the extra verbosity even with this option? Or are you using an older version of the plugin executor?

I think admins should not change the behavior of this kind of things (--parsable is exactly designed to be used in scripts). I’m not sure it is a good idea to try to comply with custom tampering of the sbatch command. For instance, the solution proposed would break if some admin add the extra verbosity after (and not before) the normal output.

I think you can create a workaround without changing the executor code, by defining an alias for the sbatch command in your bashrc that only keeps the stdout. Would this work for you?

function sbatch() {
    sbatch $@ 2> /dev/null
}

On the other hand, do we really need to merge stdout and stderr in the executor?

kennethabarr commented 1 week ago

I still receive the same "sbatch: ..." lines, but they are part of stderr not stdout.

My understanding is that the reason to merge stdout and stderr is to propagate errors encountered in the sbatch script.

I think some version of your solution is a far better hack than what I have done and I will implement it right away. I don't think I'm experienced enough with python to implement a production-ready fix in the code that can handle our creative cluster admins.