tektoncd / pipeline

A cloud-native Pipeline resource.
https://tekton.dev
Apache License 2.0
8.43k stars 1.77k forks source link

K8s ResourceQuota does not work correctly with Step Limits #4976

Open skaegi opened 2 years ago

skaegi commented 2 years ago

We are trying to use LimitRange and ResourceQuota to restrict Tekton resource usage by customers. In particular we want to "limit" their memory and cpu use to prevent Tekton work loads from hogging resources others need.

The Tekton controller tries to be clever here with resource requests spread evenly across steps but for limits this approach doesn't work as the container runtimes use the limits when defining the upper hard limit on resource usage in each container. The net result is that each step retains its full limit and the sum is used when trying to schedule which can cause problems when a ResourceQuota is in play.

With steps there is a well-known impedance mismatch with how Kubernetes wants to run containers. Tekton works around some of the issues by still running the containers in parallel but then using the Tekton entrypoint to serialize execution. I've been hoping we might get first-class "Sequences" in Kubernetes but unfortunately does not look like we're every going to get . An alternate runtime approach in Tekton that might be worth considering is to implement the Steps in a single steps container as this would less us put all step resource usage in one place and might resolve this but I suspect that ship has sailed some time ago and of course likely introduces a new set of problems.

So... we might want to call this out as a Limitation for Tekton here as Steps will always request the sum of Limits and this has consequences. We currently only use "Requests" and avoid using "Limits" in our LimitRanges and ResourceQuota altogether when using Resources with Tekton. Unfortunately that means we need additional out-of-band mechanisms to manage resource use.

skaegi commented 2 years ago

I take this all back. This is a limitation when running Tekton with Kata Containers. I'm going to work there to see if I can remove this problem as it is more general and applies to any workload running in Kata.

skaegi commented 2 years ago

/reopen Touched up title and description slightly. This issue is still very relevant when trying to use ResourceQuotas and Limits in a namespace where running Tekton Pipelines and Tasks.

tekton-robot commented 2 years ago

@skaegi: Reopened this issue.

In response to [this](https://github.com/tektoncd/pipeline/issues/4976#issuecomment-1169096308): >/reopen >Touched up title and description slightly. This issue is still very relevant when trying to use ResourceQuotas and Limits Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
tekton-robot commented 1 year ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale with a justification. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close with a justification. If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.