Open GandalfTheWhite2 opened 5 years ago
Hmmm, we already have a --resubmit flag (although it's for warmups only at the moment). I suspect the associated logic [programs.py] could be transplanted over to production mode relatively straightforwardly
That would IMHO be a huge improvement, and help reduce the "sometimes" large frustration caused by random failures.
Would it be possible to implement the option of resubmitting jobs with status FAILED? Sometimes (not very often ;-) ) jobs fail because of things unrelated to the job scripts (but because of a failure of file transfers etc). In that case it would be nice to be able to resubmit the jobs which failed - so e.g. the, 7 "subjobs" (in ganga-speak) of job N which failed. It could be an option --resubmit_failed -j N