Open noBlubb opened 3 years ago
Would be nice with more details on the status feature. I would guess it's limitation in Kubernetes or from a lower level like containerd or kernel?
At this moment this is a Kubernetes limitation (kernel and container runtime already supports resizing containers). There is work needed in scheduler, kubelet, core API so a pretty cross-cutting problem. Also a lot of systems assumed pod sized are immutable for a long time so there is need to untangle those as well.
There is ongoing work in Kubernetes to provide in-place pod resizes (Example: https://github.com/kubernetes/enhancements/pull/1883). Once that work completes VPA will be able to take advantage of that.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle rotten
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Support for in-place updates didn't make it into K8s 1.25 but it aiming for 1.26.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale Feature didn't make it in 1.26, but now targeted for 1.27 ;)
This issue seems to be a duplicate of: https://github.com/kubernetes/autoscaler/issues/5046 Shouldn't we close one of those 2 issues ?
https://github.com/kubernetes/kubernetes/pull/102884/ merged today. I will resume working on using that in VPA
@wangchen615 @voelzmo FYI
/retitle "Support in-place Pod vertical scaling in VPA"
args, don't take the quotes too literally, dear bot! 🙈
/retitle Support in-place Pod vertical scaling in VPA
/kind feature
(from https://github.com/kubernetes/autoscaler/issues/5046)
Describe the solution you'd like.: In the VPA updater, whenever existing logic decides to evict the pod, we should add a check on the pod spec to determine if
NoRestart
policy is enabled. If so, a patch request should be sent via updater directly to the pod without evicting it.
We'll need to decide how that might play out if there's an app container set to RestartNotRequired
for memory and a sidecar container set to Restart
for memory, and no special config for CPU.
Imagine that a vertical pod autoscaler decides to assign less memory to the sidecar, and that triggers a restart for the app - which isn't what the developer has intended. My imaginary developer was hoping that only the sidecar would get restarted when scaling down its memory
request.
I think that any logic here needs to look at the container level, not just the pod.
I don't think the VPA should look at the ResizePolicy
field in PodSpec.containers
at all. If I understand the KEP correctly, it is only meant as a hint towards kubelet what to do with the containers when an update to a containers resources
needs to be applied.
I think VPA only needs to understand if the featureGate InPlacePodVerticalScaling
is enabled and VPA should make use of it. If the feature is enabled and should be used, send a patch – otherwise evict like it is currently done. So the updater
probably needs a flag to turn in-place Pod vertical scaling on
or off
, defaulting to off
.
As @jbartosik summarized there is potentially also the need to configure this on the VPA level as special workloads may opt for their own special treatment.
What happens then on a node is kubelet
's job: restart a container, don't restart it, defer the update, etc...
Does that make sense?
Adding this as context, so we don't forget about this when implementing the feature: If we don't want to change existing behavior with injected sidecars, we need to find a way to achieve a similar thing like the admission-controller currently does to ignore injected sidecars when using in-place updates.
There are some open issues related to the feature: https://github.com/kubernetes/kubernetes/issues?q=is%3Aissue+is%3Aopen+%5BFG%3AInPlacePodVerticalScaling%5D
Most relevant seem:
I don't think the VPA should look at the
ResizePolicy
field inPodSpec.containers
at all.
API currently is limited and not supporting the notion of "apply changes if possible without restart and not apply otherwise". Which may impact PDB. I don't know how autoscaler deals with PDB today, but if there will be higher frequency autoscaling with InPlace update hoping for non disruptive change, this will not work. In other words, we either need a new API to resize ONLY without the restart or treat a resize as a disruption affecting PDB.
@SergeyKanzhelev thanks for joining the discussion!
I don't know how [vertical pod] autoscaler deals with PDB today
Today, VPA uses the eviction API, which respects PDB.
we either need a new API to resize ONLY without the restart or treat a resize as a disruption affecting PDB.
I'm not sure which component the "we" part in this sentence is, but in general, I tend to agree with the need for an API that respects PDB. If kubelet needs to restart the Pod for applying a resource change, this should count towards PDB. However, I think this shouldn't be a concern that VPA has to deal with. Similarly to eviction, VPA should just be using an API that respects PDB if we consider this relevant for the restart case as well.
Regarding my statement from above
I don't think the VPA should look at the ResizePolicy field in PodSpec.containers at all.
This is no longer correct, as @jbartosik opted for a more informed approach in the enhancement proposal. Currently, VPA implements some constraints to ensure resource updates don't happen too frequently (for example, by requiring a mimimum absolute/relative change for Pods which have been running for shorter than 12 hours). The proposal contains the idea to change these constraints if a Container has ResizePolicy: NotRequired
.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale /lifecycle frozen
Hi folks, could someone share a summary of what is blocking this feature please? +1 that this would be really useful to reduce workload evictions. Thank you!
I think the summary is: the kubernetes feature for in-place resource updates is in alpha stage and there are still many things to be done before it will be promoted to beta status. See https://github.com/kubernetes/enhancements/pull/4704 for a summary and the ongoing discussion. As for beta, many things will fundamentally change e.g. what the API for this feature is (they're e.g. talking about an introduction of a /resize subresource), I don't think we can start working on this from the VPA side before the feature reaches beta state in k/k.
Also note that a work-in-progress PR does exist: https://github.com/kubernetes/autoscaler/pull/6652
Help with the implementation (both for Pod-level resizing, and automatically managing that size) is very welcome.
Help with the implementation (both for Pod-level resizing, and automatically managing that size) is very welcome.
If someone did want to help, where can they go to get involved?
Help with the implementation (both for Pod-level resizing, and automatically managing that size) is very welcome.
@sftim I would also be interested in helping
To re-iterate on https://github.com/kubernetes/autoscaler/issues/4016#issuecomment-1650654401, there is no API exposed that would mean "no restart resize". Checking for ResizePolicy
may only give information when container if DEFINITELY WILL BE restarted, but if the policy is not this, container MAY be restarted on resize.
If the API "no restart resize" is required, it will be a great feedback for the KEP.
We cannot just assume there will be no restart. VPA will need to have some sort of logic of respecting PDB and detecting restarts as well as a way for users to block VPA for a specfic Pod or Container.
Hey everyone,
as I gather the VPA currently cannot update pods without recreating them:
and neither can the GKE vertical scaler:
Unfortunately, I was unable to learn the specific limitation from this (other than the mere absence of any such feature?) nor timeline for this to appear in VPA or how to contribute on this if possible. Could you please outline what is missing in VPA for this to be implemented?
Best regards, Raffael