rwth-i6 / sisyphus

A Workflow Manager in Python
Mozilla Public License 2.0
45 stars 24 forks source link

Task's resume function is ignored #200

Open dthulke opened 3 months ago

dthulke commented 3 months ago

When defining a Task you can set a resume function name which is called instead of the actual Task function when a task is resumed, e.g.:

yield Task('run', resume='resume_run')

For this to work, the worker need to set the resume_job flag when calling Task.run. But this is currently hardcoded to False: https://github.com/rwth-i6/sisyphus/blob/a22e9236ef2a0dcb62fc322bd012f9d0f4e95063/sisyphus/worker.py#L244-L253

The bug was introduced in commit c4f825b043bfd81c75d40f7fd9719e7ffb375f42 which removed the logic that detected whether a job is resumed.

To fix this I see two options:

  1. Remove the logic that allows to set a different function on resume. No one noticed that this is broken for six years, no one asked for this features since then and I don't know of any jobs that use this. This might also allow to cleanup the interface and replace the resume: str by a resumable: bool (keeping and deprecating resume for compatibility).
  2. Implement the feature again, but this might require handling of a few edge cases to reliably detect resumed jobs across different engines.

Any opinions? I'd vote for Option 1.

michelwi commented 3 months ago

Do you need/would like to have a separate function to resume a job?

dthulke commented 3 months ago

No, this is why I suggest Option 1 and propose to remove the existing broken code that tries to do this.

JackTemaki commented 3 months ago

I would prefer also option 1, I used this feature once in all the years (before the bug was even introduced), and then figured out I can solve it in a different way.

critias commented 3 months ago

Agreed, Option 1 is the way to go here!

The separation into two functions is a left over from the beginning of Sisyphus. I thought we need two function, one for a clean start and one which can resume an interrupted job. In practice I realized that the resumeable function need to handle all possible cases, including a clean start, anyway, so there is really no need to have two functions here.

dthulke commented 3 months ago

Great! Will prepare a PR.