kestra-io / kestra

Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
https://kestra.io
Apache License 2.0
7.64k stars 467 forks source link

Bulk replay from any task run in selected failed Executions (related to Test Mode/Playground/Dryrun) #1082

Open brian-mulier-p opened 1 year ago

brian-mulier-p commented 1 year ago

Feature description

It could be really useful for run purpose (to skip a failing task due to external integration which isn't idempotent for ex ?).

What I have in mind would be to parse every selected failed tasks in front-end and call back-end to identify which task we can restart from when replaying (to display only restart points which are either in current flowable task or non-nested tasks, since starting in a flowable task out of current one would require some complexity to handle "value" for eg. in "each*" tasks).

The starting points should then be grouped by identified flow types (based on the selected ones) and allow user to specify for each type where to start from, eventually allowing users to add dynamic variables (based on current inputs / outputs for eg.) before replay.

Tell me if it is unclear but I really think it would give Kestra a great improvement regarding the ability to manage production overview and border-cases handling.

brian-mulier-p commented 1 year ago

After some testing, seems like the "change status" feature can cover most usecases (except if you want to replay starting from a previous task). However maybe this feature should be available in bulk. Also, it could be helpful to have the ability to modify context dynamically (sort of adding a variable based on another context variable) per selected different flow type ?

anna-geller commented 3 weeks ago

some overlap with the planned Test Mode https://www.notion.so/kestra-io/Test-Mode-Playground-Dryrun-58cea4ba311f4af5818459617450f801?pvs=4 even though I understand how Replay is still different as it will allow better rerun of failed Executions in production rather than rerun during testing