Concurrency limiter controller

imjasonh commented 3 years ago

Opening this issue to collect ideas, discussion, interest, etc., for a supplemental PipelineRun controller (and possibly TaskRun controller?) that manages Pending PipelineRuns and update them to a Running state to limit execution concurrency.

We've heard a few use cases for limiting execution concurrency, but so far it's been hard to generalize the various needs into one single unified "concurrency" concept that we can apply across all of Tekton Pipelines. Some users might only want to have "deployment" pipeline running at a time, across the whole cluster. Others might want one "deployment" pipeline per namespace, or per deployment target (only one pipeline can deploy to Prod at a time, but you can deploy to Prod and Staging at the same time), or per input source (only deploy my Git repo to one place at a time), or per authorizing user (Alice can only deploy to one place at a time).

Users might also want to limit TaskRun concurrency, either when run as part of a PipelineRun or when executed directly.

We can experiment with supporting these various models and provide a runnable example of limiting concurrency, that users can adapt to their own needs.

As an initial idea, a concurrency controller could be configured with a ConfigMap describing a concurrency key format, and a concurrency limit:

kind: ConfigMap
metadata:
  name: concurrency-controller
data:
  concurrency-key: $(metadata.namespace)-$(spec.pipelineRef.name)
  concurrency-limit: 3

In this example, the key would limit the execution of PipelineRuns referencing the same Pipeline, running in the same namespace, to a max of 3. The concurrency controller would watch for Pending PipelineRuns, derive their keys, count ongoing PipelineRuns with the matching key, and choose to start the new Pending PipelineRun if count < limit. When a PipelineRun finishes, the concurrency controller would reevaluate any Pending runs, and choose one to start if it's under the limit.

(This is just one idea for describing this, if you have something else in mind please contribute it below)

ghost commented 3 years ago

Here are the issues / PRs / TEPs related to this that I have seen so far:

Big +1 from my pov on making this a component external to Pipelines.

bigkevmcd commented 3 years ago

Should there be some sort of load-shedding?

Can you queue PipelineRuns for ever? Do they timeout?

imjasonh commented 3 years ago

cc @jbarrick-mesosphere for his work on the Pending TEP

imjasonh commented 3 years ago

Should there be some sort of load-shedding?

Can you queue PipelineRuns for ever? Do they timeout?

Excellent question! This seems like another useful configuration for the limiter, max age before dropping it on the ground.

Users might also want to be able to describe/derive a priority, which would weight a Pending PipelineRun ahead of others in the same concurrency bucket. edit: Along with priority comes preemption -- e.g., a new high-priority Pending PipelineRun should cancel an ongoing run to make room for it.

Ultimately the deliverable here isn't a production-grade maximally configurable controller, just a minimally useful example that operators can potentially modify to their own needs.

mjgallag commented 3 years ago

I'm currently facing this issue trying to do "branch preview", i.e. building and deploying each branch on every push to separate urls. Multiple pushes to multiple branches can run in parallel but multiple pushes to a single branch should be processed in order one at a time. I believe this use case would require the concurrency key format to have access to PipelineRun fields so that branch name could be included.

julweber commented 3 years ago

+1

eccox commented 3 years ago

+1

dbazhal commented 3 years ago

Plusing simultaneous pipelineruns limit.

Would like something as simple as

kind: Pipeline
...
spec:
    runPolicy:
      type: Parallel
      parallel:
          maxLimit: 3

with alternatives as Sequential, and LatestOnly, first executing run requests in natural order, starting next one when previous finishes or cancelled, and last one cancelling any previous runs as the new run is created.

And i expect that this functionality is tekton operator domain, because it would be strange if some external would decide should pipeline operator start processing next run, or should it wait.

I assume pending state is for situations like this.

I refer to openshift operators processing parallelism for BuildConfigs and Builds as straight analogy and good example how it should be done.

https://docs.okd.io/4.7/cicd/builds/advanced-build-operations.html#builds-build-run-policy_advanced-build-operations

https://github.com/openshift/openshift-controller-manager/blob/461fe64e30847a5ae9c361500d7434d2f1756de2/pkg/build/controller/build/build_controller.go#L714

https://github.com/openshift/openshift-controller-manager/blob/461fe64e30847a5ae9c361500d7434d2f1756de2/pkg/build/controller/policy/serial.go

dbazhal commented 3 years ago

I suppose run policy is also somehow connected with https://github.com/tektoncd/operator/issues/209

dbazhal commented 3 years ago

As an alternative to operator functionality, i can make pipeline runs lock on something with first task of pipeline, and release lock with the finally. But it would break any pipeline run timing metrics as pipelines will start running much longer(including lock release wait time). I'd like pipeline duration numbers contain only "useful" info, showing how long task execution took, but not how long pipeline was waiting for another run to complete.

juliaaano commented 3 years ago

This is an important feature I have found and used in most CI systems.

Usually it is not affordable having two pipeline runs running at the same time if they modify a shared resource, such as if they result in api calls to a single instance of a system.

An approach like the one used in GitHub Actions seems an elegant way of implementing this feature: https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#concurrency

dmikalova commented 3 years ago

This would be useful for serializing Terraform runs:

If I have several triggers around the same time Terraform should only run one at a time in the order they came in.
- An option to cancel intermediate runs - so that any pending runs are cancelled by newer pending runs, but running runs are not cancelled.
- The queue should be keyed - it's not so much the Terraform pipeline that needs to be serialized, but Terraform runs of a specific key that need to be serialized.

I was able to implement this in Jenkins but the syntax for it was torturous.

tekton-robot commented 2 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

dbazhal commented 2 years ago

/remove-lifecycle stale /lifecycle frozen

david972 commented 2 years ago

+1

shaharb-hs commented 1 year ago

👍

AshwinSridharan0410 commented 1 year ago

Hi. I would like to run my databases parallely so that when I give the flyway command, it should happen parallely to all the databases. I dont want the process to happen sequentially.Any idea would be helpful

emirot commented 1 year ago

Any updates on that ? Found that workaround https://holly-k-cummins.medium.com/using-lease-resources-to-manage-concurrency-in-tekton-builds-344ba84df297 but this is not native and does not have ordering.

jimmyjones2 commented 1 year ago

With TEP-0135 coscheduling mode it'll delete PVCs when PipelineRuns are finished. Maybe adding a ResourceQuota for number of PVC will therefore limit the number of concurrent PipelineRuns to that limit?

tektoncd / experimental

Concurrency limiter controller #699