Closed luator closed 5 months ago
Once !77 is merged exit_for_resume()
can be used on Slurm. With this one can check the spent time in the job and exit for resume a bit before running out of time. It requires adding logic to keep track of the time but I think it is a more proper, explicit solution compared to just restarting any job that ran out of time. Thus I'll close this "wontfix". Feel free to complain if you think this would be a needed feature :).
By Felix Widmaier on 2024-01-22T15:42:21 (imported from GitLab)
Reopened based on https://gitlab.tuebingen.mpg.de/mrolinek/cluster_utils/-/merge_requests/77#note_21729
By Felix Widmaier on 2024-01-29T13:51:05 (imported from GitLab)
unassigned @felixwidmaier
By Felix Widmaier on 2023-12-11T13:40:58 (imported from GitLab)
Not fully automatic but with adding a bit of code to the job script, this can now be done using the timeout signal (#84)
In Slurm you have to specify the time required for your job. Jobs that exceed this time may be killed by the scheduler. If this happens, it would probably be nice if cluster_utils would detect it and optionally restart them automatically.