snakemake / snakemake-executor-plugin-slurm

A Snakemake executor plugin for submitting jobs to a SLURM cluster
MIT License
15 stars 17 forks source link

Separately logging stdout and stderr #95

Open bentyeh opened 4 months ago

bentyeh commented 4 months ago

Currently, only stdout appears to be logged via the --output option to `sbatch:

https://github.com/snakemake/snakemake-executor-plugin-slurm/blob/2e5d308e56cf400854b0602e5b51dea44efe5c47/snakemake_executor_plugin_slurm/__init__.py#L97

Is there a way to separately log stderr, ideally without having to specify slurm_extra="--error=<path_to_log>.err" manually as a resource for each rule?

cmeesters commented 4 months ago

Currently, not, no. As snakefmt will inform us upon developing a workflow, the log directive is a semi-requirement. Hence, this was never implemented.

However, it would be easy to do so. Do you have an example where the separation makes sense?

bentyeh commented 4 months ago

I actually wasn't thinking of the log directive at all, or even the Snakefile, really.

I was hoping that the slurm executor could read output and error arguments from a workflow profile, just like it accepts different resource specifications from the workflow profile.

For example, part of my workflow profile might look something like this:

set-resources:
  - rule_1
    - mem_mb=12000
  - rule_2
    - runtime=1200

I hope that this could be extended to something like

set-resources:
  - rule_1
    - mem_mb=12000
    - output='log/rule_1.out'
    - error='log/rule_1.err'
  - rule_2
    - runtime=1200
    - output='log/rule_2.out'
    - error='log/rule_2.err'

or even better, accept wildcards as follows:

default-resources:
  - output='log/{rule}_{sample}_{jobid}.out'
  - error='log/{rule}_{sample}_{jobid}.err'

set-resources:
  - rule_1
    - mem_mb=12000
  - rule_2
    - runtime=1200
claczny commented 3 months ago

I see the possible interest in splitting stdout and stderr, if only to keep the interface consistent with SLURM. However, an even bigger issue I see in this context is that slurm_logfile is automatically defined in run_job() (https://github.com/snakemake/snakemake-executor-plugin-slurm/blob/2e5d308e56cf400854b0602e5b51dea44efe5c47/snakemake_executor_plugin_slurm/__init__.py#L60). I couldn't find a way to overcome this, and having something like slurm_extra: "'--qos=normal --output=logs/slurm_logs/{wildcards.sid}/{rule}/{rule}-%j.out'" does not work, while it worked well in < snakemake-8.

cmeesters commented 3 months ago

Snakemake is not SLURM. Snakemake collects its own logs (far more informative, than the SLURM output, as our jobscript is a Snakemake process itself).

Hence, we settled for the current implementation with a pre-defined path. In a future version, it will allow us to erase old log files automatically and thereby save precious metadata space on parallel file systems. But, yeah, definitively need to work on this.

SLURM, by default (you might know this, it's for the benefit of those reading along), combines stdout and stderr. Unless developing a workflow, where we inevitably break things, those logfiles are mostly empty and contain otherwise redundant Snakemake output.

Why would be a split beneficial?

claczny commented 3 months ago

I understand your point. Yet isn't it possibly confusing to users that snakemake imposes specific behavior that cannot be adjusted individually? Defining a default is fine but having the option to modify, if needed, appears better to me. And for the sake of "portability" across scheduling systems, I am not familiar with other systems, but wouldn't they also have a split stdout/stderr option typically, if only to stay consistent with the ?GNU?/?LINUX? default behavior?

I also get the point of keeping the metadata space small, but deleting SLURM logs is in my opinion bad and unexpected behavior. I, for one, like to retain my SLURM logs of successful jobs for provenance reasons. Importantly, the SLURM log that is produced by the SLURM executor plugin looks different than what SLURM typically returns (regardless of whether stdout and stderr are split or not). Importantly, if the logs collected by snakemake are "far more informative", then shouldn't they be treated separately? But maybe this is a separate issue that I should open (?)

Thanks for all your efforts!

cmeesters commented 3 months ago

If you write a SLURM jobscript (you can write entire workflows with SLURM) and have no -o/--output option, SLURM defines a default output for you. And it combines stdout and stderr. This is a sensible thing to do, for otherwise information gets lost.

If only -o/--output is defined, again stdout and stderr are combined. That's fine: Why would you want to split, in the first place? When the program in question is cluttering your terminal in a non-job with scientific information! Then you have to split the error stream, such, that downstream processing works with pipes or other tools (for errors or warnings in between would render such output not parsable). Snakemake lets you build workflows with group jobs and pipes, you do not need to parse the output. It offers to collect logs of the programs and to write additional messages on stdout. After all, Snakemake is a feature-rich workflow system, no more need for #SBATCH --dependency=afterok:<id> and error handling programmed by the user.

As Snakemake is the jobscript splitting stdout and stderr would mean to derive users of interpreting that potentially erroneous output in case of more complex errors. At the same time, Snakemake is writing the same output to its own logs (the job logs are split up per rule and sample, usually). Hence, I was mentioning the redundancy. It also allows you to collect metadata and “inject” workflow rules to archive straight away (with a number of supported archiving systems or “remote” files), etc. Hence, I do not really understand why you want to keep job logs for provenance reasons. Doing so for SLURM-only work, is great! Here, though? Rest assured, when this feature comes, it will be a) configurable and b) optional.