Open kodmaskinen opened 7 months ago
Yeah, so deleting the VPACheckpoints is 100% related to what I described in #5598:
Rollout
are created before the Selector is changed to match them. As pointed out before, Rollout
works by only updating the Selector to match the new Pods after promoting the new versionRollout
object, therefore, the new Pods are not determined to be under control of the VPA
VPACheckpoint
per VPA-Container
pairRollout
, this check is false
, in contrast to e.g. a Deployment
, where the selectors are updated when the new ReplicaSet
is created.Rollout
Selector has been switched, VPACheckpoints are created, but only from the new Aggregations. They're never merged with the existing AggregationsHope that explains it a bit.
In general, it seems that the way how Rollouts
are designed, it is pretty incompatible how VPA works currently. I guess that's also one of the reasons, why e.g. knative doesn't have VPA support: it is pretty hard to integrate with the process of rolling out new versions by first creating the Pods and only later on switching and updating the selector.
/remove-kind bug /kind support
Thanks for the explanation!
It seems to me like it would work if VPA treated a Rollout
more like a Deployment
and used the .spec.Selector
instead of the .status.Selector
. It would, however, need to handle the case where a Rollout
references a Deployment
in .spec.workloadRef
, and in that case get the selector from the .spec.Selector
of the Deployment
.
/area vertical-pod-autoscaler
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
Thanks, @voelzmo for the detailed explanation.
As @voelzmo explained the things, so @kodmaskinen if your concern is resolved can we close this issue?
The issue is still there and I believe that it could be fixed. However, my golang skills are lacking and I have only skimmed the source code, so there's a real possibility that I'm missing something.
As for our use-case, we are setting the minimum memory a lot higher than what we would have to if the recommendations were not "forgotten" during deployments (using Argo Rollouts) in order to avoid OOM. It's not ideal, but it's the best we can do right now.
Thanks @kodmaskinen for the information.
Which component are you using?: vertical-pod-autoscaler
What version of the component are you using?: Component version: 1.0.0
What k8s version are you using (
kubectl version
)?: 1.29.1kubectl version
OutputWhat environment is this in?: EKS
What did you expect to happen?: I expect the VPA to retain the history from earlier versions of the same Rollout.
What happened instead?: VPA deletes the history from the VerticalPodAutoscalerCheckpoint during deployment of a new version using Argo Rollouts, which often means that the memory target is initially set to low which causes unnecessary OOM situations.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: This may be related to the issue mentioned in #5598.