tektoncd / pipeline

A cloud-native Pipeline resource.
https://tekton.dev
Apache License 2.0
8.37k stars 1.76k forks source link

Design: Partial Pipeline execution #50

Open bobcatfish opened 5 years ago

bobcatfish commented 5 years ago

The work for this task is to design this feature and present one or more proposals (before implementing).

Expected Behavior

If a pipeline has many tasks and takes a long time to run (e.g. tens of minutes, or even hours), and one Task fails, it might be desirable to be able to pick up execution where the Task failed, with different PipelineParams (e.g. from a different git commit), so you can resume the Pipeline without having to rerun the whole thing.

Some ideas for how to implement this:

  1. Fields in a PipelineRun which override which Tasks to run from / refer to a previous PipelineRun from which results should be taken
  2. A tool which makes it easy to create a new Pipeline from an existing one which only runs a subset of the Tasks

It is also worth considering what this could be like via a UI: if one is viewing a Pipeline in a UI, and wants to re-run only a portion of the Pipeline, they probably want the user experience to be as if they were still running the same Pipeline, even if underneath a new Pipeline is created.

Actual Behavior

At the moment, if any Task in a Pipeline fails, your options to rerun the rest of the Pipeline would be:

  1. Run the entire Pipeline again
  2. Create a new Pipeline from the previous one which contains only the Tasks you wish to run

Additional Info

This originally came up in discussion about #39, in the context of whether or not we'd want to always use the same git commit from a source for all Tasks in a Pipeline, or if we wanted sometimes for a Task to always use HEAD. This would allow a user to change a repo, by updating HEAD, between Task executions.

The feature of partial pipeline execution could be an alternative to this.

bobcatfish commented 5 years ago

@BenTheElder, @cjwagner and some other Prow folks indicated that this would be a very desirable feature for them - particularly in a case where your pipeline has 2 phases, one that builds a bunch of stuff and then subsequent phases that use that built stuff, it'd be handy to be able to resume after the point where the stuff is built

gsaslis commented 4 years ago

Just to add (or rather try to help clarify) a use case here.

This is a very useful feature for long-running pipelines that probably fall outside the strict CI scope. Most pipelines I have in mind are essentially workflow automation pipelines and have external dependencies such as 3rd party systems that need to be up / reachable.

When such a pipeline fails at step 7/11, you really don't want to rerun the whole thing. The Jenkins Restart from stage feature is ideal for the pipeline to essentially pick up where it left off.

The problem with most Jenkins pipelines is that they are not written in such a way that restarting from any particular stage would be possible, as inputs / outputs of each stage (task) are not always well defined.

Coming to Tekton and finding inputs/outputs so explicitly declared, I almost see an opportunity whereby, once this feature is implemented, it will work on "all" Tekton pipelines, significantly widening the scope of problems tekton pipelines can be used to solve. (Plus anyone who relies on this on Jenkins will find it easier to migrate to Tekton).

As a final point, I would like to clarify that in my use case, support for restarting with "different PipelineParams" (as mentioned in the description) is not a necessary feature. I am sure people have use cases for that too, but I personally like the approach Jenkins takes here: you can either restart the whole pipeline with different params (new pipeline run), or restart from stage, when it failed, always with the same params (retry failed pipeline run, starting from failed task).

Hope this helps.

tekton-robot commented 3 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

tekton-robot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot commented 3 years ago

@tekton-robot: Closing this issue.

In response to [this](https://github.com/tektoncd/pipeline/issues/50#issuecomment-673775880): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >/close > >Send feedback to [tektoncd/plumbing](https://github.com/tektoncd/plumbing). Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
vdemeester commented 3 years ago

/remove-lifecycle rotten /remove-lifecycle stale /reopen

tekton-robot commented 3 years ago

@vdemeester: Reopened this issue.

In response to [this](https://github.com/tektoncd/pipeline/issues/50#issuecomment-674776965): >/remove-lifecycle rotten >/remove-lifecycle stale >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
bobcatfish commented 3 years ago

This one is on our roadmap: https://github.com/tektoncd/pipeline/blob/master/roadmap.md

/lifecycle frozen

coryrc commented 3 years ago

The problem with most Jenkins pipelines is that they are not written in such a way that restarting from any particular stage would be possible, as inputs / outputs of each stage (task) are not always well defined.

Coming to Tekton and finding inputs/outputs so explicitly declared, I almost see an opportunity whereby, once this feature is implemented, it will work on "all" Tekton pipelines

I think where this falls apart is lacking copy-on-write workspaces. Tasks can easily introduce changes to workspaces which makes execution non-idempotent. If instead workspaces were always inputs xor outputs the input to any given stage would still exist when the retry is attempted.

coryrc commented 3 years ago

input/output workspace layering cannot be done efficiently without copy-on-write workspaces. It could be done with an NFS server and overlay filesystems, or there appear to be some COW volumes but I do not know enough about k8s to say if it's usable for this.

bobcatfish commented 3 years ago

Tasks can easily introduce changes to workspaces which makes execution non-idempotent

I think there will be use cases where folks want to make these kinds of non-idempotent changes to workspaces, so even with COW I'm not sure we could fully solve this problem? If I'm wrong it would probably help if you could explain with an example.

Also: it sounds like COW workspaces would be an interesting feature in general - if you feel motivated it'd be great to have a separate issue to dive into this in detail

bobcatfish commented 3 years ago

Quick update here: @jerop created a design for #1797 which has some interesting ideas that could be applied to a design for partial execution (design doc).

pritidesai commented 1 year ago

TEP-0123 Specifying on-demand-retry in a pipelineTask does not offer solution for this feature. But proposes a feature to allow specifying on-demand-retry at the authoring time.

jwx0925 commented 6 months ago

Can Tekton Pipeline provide the functionality to rerun failed tasks in a pipeline? This would be very useful for our scenario, where a complex pipeline fails on the last task, requiring manual intervention to manually rerun the failed task. GitHub Actions has a similar feature.