wlynch commented 4 years ago

This was an idea that @k floated to me awhile back, but I finally got around to making an issue to discuss. What I'm curious about:

Is this a use case we want to focus on?
Is this worth make this a built-in feature? (as opposed to a Catalog feature)
Any other features / alternatives to consider

Details intentionally vague - this is a "should we do this?" issue, not a "how we'll do this" issue.

Idea

I may want to control how Pipelines run in relation to others and ensure only 1 pipeline for a given selector can run at a time (hence a "mutex"). I may want to reject new Pipelines if one if a similar one is already running, or queue it up and just make sure it does not run in parallel. This might be because:

I have a presubmit pipeline that I want to only run one instance at a time per pull request to reduce costs (e.g. in case someone pushes multiple commits. I only need to run the most recent and can cancel the currently running pipelines).
My pipeline mutates some external state, and I want to make sure only one thing operates on it at a time.

Possible solution

Have a mechanism to select conditions to allow Pipeline execution, as well as a strategy for what to do in response.

Examples

If a new Pipeline is created that was labelled as a pull request, cancel existing runs.

selector: repo=foo, type=pullrequest
strategy: cancel

Only run 1 pipeline at a time that was labelled as being started by a push to master. (does not guarantee ordering)

selector: repo=foo, type=push, ref=master
strategy: queue

Deny new pipeline create requests if they match a pipeline currently running.

selector: repo=foo, type=push, ref=master
strategy: deny

Alternatives

Implement as a task

Cancellation could be handled by having the first step of every pipeline could include something along the lines of
```
kubectl delete pipelinerun -l foo=bar
```
This would clobber over any other Pipelines with a particular label.
Queueing could be handled by having a Condition that runs kubectl get for running pods, and only proceed if a condition is true. This is difficult since you'd have to get creative in inspecting runtime information of other runs (e.g. are they also in a wait state, or are they running). This also creates container waste since the pipelines would all be running.
Deny could not be implemented this way.

vdemeester commented 4 years ago

/kind feature

tragiclifestories commented 4 years ago

We've built a queueing system to manage our way around this problem, so a +1 for it being a useful thing to tackle. I don't know whether it should be a core primitive or in the catalog, but in our use case is was necessary at a very early stage, and for deployments specifically it seems to me that having one deployment per app per environment at a time, and ideally in a sensible order, is going to be a very common requirement. So I'm leaning towards 'core'.

holly-cummins commented 4 years ago

I'd really appreciate this as well, perhaps with Task granularity rather than pipeline. My use case for this, which is to do with cross-talk between concurrent runs of integration tests/db management. In an integration test scenario, for example, the tasks depend on an external resource. If that resource is stateful (like a database), some tasks are rebuilding the database while others might be executing tests which use the database. I'd love to be able to single-thread pipeline runs through the integration test phase.

resamaraschi commented 4 years ago

I also think this is a very common use case for a CI/CD Pipeline. Our scenario is that we have one test cluster for all the created PRs. For instance when 2 developers opens 2 PRs, the pipeline should test one PR against the test cluster first and set the status for corresponding PR. Meanwhile all other pipelineRuns for other PRs should be queued until the cluster is free for the next run.

holly-cummins commented 4 years ago

I got inspired by @tragiclifestories 's suggestion of a queueing system as a workaround, so I made one too. I documented the steps - hopefully it's useful to someone else while this is pending: https://medium.com/@holly.k.cummins/using-lease-resources-to-manage-concurrency-in-tekton-builds-344ba84df297

tragiclifestories commented 4 years ago

Interesting!

We took a different approach by storing the queue data in configmaps and defining all the queue operations as scripts that run in task steps. So no explicit modelling through CRDs but it works well enough for our use case.

Hopefully we'll get around to the blog-post stage of the project soon.

afrittoli commented 4 years ago

I got inspired by @tragiclifestories 's suggestion of a queueing system as a workaround, so I made one too. I documented the steps - hopefully it's useful to someone else while this is pending: https://medium.com/@holly.k.cummins/using-lease-resources-to-manage-concurrency-in-tekton-builds-344ba84df297

Nice :) @pritidesai finally in action

PoliM commented 4 years ago

Here is the use case we currently have. Imagine this simplified CD pipeline: --> DeployToDev --> TestOnDev --> DeployToPreProd --> TestOnPreProd --> DeployToProd --> SmoketestOnProd There are something like three sections Dev, PreProd and Prod. The pipeline is started for a commit in a Git repository that holds the configuration of an application (GitOps). Now here are some requirements:

if a PipelineRun is in TestOnPreProd we don't want another PipelineRun to start DeployToPreProd because the test should be the result of the configuration that the first PipelineRun was started with.
but it is ok to have another PipelineRun starting to DeployToDev.
a PipelineRun must not take over another PipelineRun because the CD pipeline should deliver the stuff in the order as the configurations were committed to the Git repository.
to make things worse - we use the same pipeline to deploy different applications. So the exclusivity should be per section and application.

Currently we use a task in front of the section that polls a REST service to "ask to enter the section". The implementation of the REST service is specific to our pipeline and uses the Tekton API to analyse the state of all the PipelineRuns. It's ugly :blush: but it works so far.

tekton-robot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

Letty5411 commented 3 years ago

I'd prefer both task and pipeline granularity mutex.

Letty5411 commented 3 years ago

Hi @pritidesai , is there any update about this issue? Thanks :)

tekton-robot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

julweber commented 3 years ago

+1

julweber commented 3 years ago

Dear tekton Team,

is there any update about this?

It would be great if there was an option to set pipeline runs to a serial mode. In the scenario that a new pipeline run is added while a pipeline run for the same pipeline within the same namespace is already running, the new run could wait for the existing one to finish before being executed.

For example: Concourse CI allows this via a serial job - https://concourse-ci.org/serial-job-example.html

Kind regards

jerop commented 3 years ago

is there any update about this?

@julweber this is being explored in tekton experimental repo: https://github.com/tektoncd/experimental/issues/699 - @ImJasonH shared an idea in that issue

julweber commented 3 years ago

Hey @jerop ,

sorry for the late reply. Thanks a lot for the link, i will have a look.

Cheers, Julian

tekton-robot commented 3 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen with a justification. Mark the issue as fresh with /remove-lifecycle rotten with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot commented 3 years ago

@tekton-robot: Closing this issue.

In response to [this](https://github.com/tektoncd/pipeline/issues/2828#issuecomment-863969379): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen` with a justification. >Mark the issue as fresh with `/remove-lifecycle rotten` with a justification. >If this issue should be exempted, mark the issue as frozen with `/lifecycle frozen` with a justification. > >/close > >Send feedback to [tektoncd/plumbing](https://github.com/tektoncd/plumbing). Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

afrittoli commented 2 years ago

/reopen

tekton-robot commented 2 years ago

@afrittoli: Reopened this issue.

In response to [this](https://github.com/tektoncd/pipeline/issues/2828#issuecomment-949584103): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

afrittoli commented 2 years ago

/remove-lifecycle rotten

bobcatfish commented 2 years ago

Feels like this could be part of a possible solution to the discussion we've been having over at: https://github.com/tektoncd/plumbing/issues/888#issuecomment-943472664

If we want to take this forward I think what will really help is fleshing out the use cases that this feature would solve; @afrittoli this might not be the quite behavior you'd want for some of our common dogfooding use cases (tho it would be better than having a race!):

I may want to reject new Pipelines if one if a similar one is already running, or queue it up and just make sure it does not run in parallel.

For PR triggered PipelineRuns/TaskRuns I think what you often want is to run the newest one and cancel the others (e.g. imagining a PR being updated after kicking off PipelineRuns/TaskRuns)

tekton-robot commented 2 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot commented 2 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten with a justification. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot commented 2 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen with a justification. Mark the issue as fresh with /remove-lifecycle rotten with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot commented 2 years ago

@tekton-robot: Closing this issue.

In response to [this](https://github.com/tektoncd/pipeline/issues/2828#issuecomment-1078482693): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen` with a justification. >Mark the issue as fresh with `/remove-lifecycle rotten` with a justification. >If this issue should be exempted, mark the issue as frozen with `/lifecycle frozen` with a justification. > >/close > >Send feedback to [tektoncd/plumbing](https://github.com/tektoncd/plumbing). Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

lbernick commented 2 years ago

/lifecycle frozen

vdemeester commented 2 years ago

tektoncd / pipeline

Idea: Pipeline Mutexes #2828

Idea

Possible solution

Examples

Alternatives

Implement as a task