Closed luator closed 7 months ago
..._in_secs
is a good idea to make the format/unit clear. I'm generally not so happy with the name timeout_signal_time
, as I think it's not very descriptive. Something like seconds_before_timeout_at_which_signal_is_sent
would be much clearer but also annoyingly long...request_time
as the parsing done by Slurm for that is actually not so trivial (from the sbatch manpage: 'Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".'), so I'd rather not deal with that in our code.How about signal_seconds_before_timeout
for the parameter name?
signal_seconds_to_timeout
would be a bit shorter even, but works for me with "before" as well.
2. From what I understood, it might not work directly if you have a bash script wrapping your main script, though (but I didn't explicitly test that).
Maybe noting this in the docs would be nice then.
Maybe noting this in the docs would be nice then.
I'll actually do an explicit test first, so we know for sure.
Wrapper scripts are indeed a problem for the signal. I couldn't directly find a proper way of handling it in this case, so I just added a warning to the docs for now.
TL;DR
Add option to send signal to jobs before they are killed by timeout.
Long Version
Add a new option
timeout_signal_time
in the cluster_requirements section. If set, it adds--signal=USR1@<time>
to thesbatch
call, resulting in Slurm to send a USR1 signal at the specified time (in seconds) before it would kill the job due to timeout.For the signal to actually be sent to the process running the main script, this needs to be executed with
srun
, so some change in the job command creation is needed to inject that for Slurm jobs.Also add an example and embed it in the documentation.
Resolves #64.
How I Tested
By running the example.