Open bzizou opened 9 years ago
For example, in the JDL, we could have:
"action_on": { "timeout": "ignore|resubmit|blacklist", "walltime": "ignore|resubmit|blacklist" },
with ignore=fix the event, resubmit=fix the event and resubmit, blacklist=disable the cluster until manual fix
for RUNNER_SUBMIT_TIMEOUT, we can also have a "retrieve" option: try to retrieve the submitted jobs by searching jobs submitted in the time interval with the name of the campaign... or tagging jobs (how?)
For example, in the JDL, we could have:
"action_on": { "timeout": "ignore|resubmit|blacklist", "walltime": "ignore|resubmit|blacklist" },
with ignore=fix the event, resubmit=fix the event and resubmit, blacklist=disable the cluster until manual fix