Open kodmaskinen opened 2 months ago
Yeah, so deleting the VPACheckpoints is 100% related to what I described in #5598:
Rollout
are created before the Selector is changed to match them. As pointed out before, Rollout
works by only updating the Selector to match the new Pods after promoting the new versionRollout
object, therefore, the new Pods are not determined to be under control of the VPA
VPACheckpoint
per VPA-Container
pairRollout
, this check is false
, in contrast to e.g. a Deployment
, where the selectors are updated when the new ReplicaSet
is created.Rollout
Selector has been switched, VPACheckpoints are created, but only from the new Aggregations. They're never merged with the existing AggregationsHope that explains it a bit.
In general, it seems that the way how Rollouts
are designed, it is pretty incompatible how VPA works currently. I guess that's also one of the reasons, why e.g. knative doesn't have VPA support: it is pretty hard to integrate with the process of rolling out new versions by first creating the Pods and only later on switching and updating the selector.
/remove-kind bug /kind support
Thanks for the explanation!
It seems to me like it would work if VPA treated a Rollout
more like a Deployment
and used the .spec.Selector
instead of the .status.Selector
. It would, however, need to handle the case where a Rollout
references a Deployment
in .spec.workloadRef
, and in that case get the selector from the .spec.Selector
of the Deployment
.
/area vertical-pod-autoscaler
Which component are you using?: vertical-pod-autoscaler
What version of the component are you using?: Component version: 1.0.0
What k8s version are you using (
kubectl version
)?: 1.29.1kubectl version
OutputWhat environment is this in?: EKS
What did you expect to happen?: I expect the VPA to retain the history from earlier versions of the same Rollout.
What happened instead?: VPA deletes the history from the VerticalPodAutoscalerCheckpoint during deployment of a new version using Argo Rollouts, which often means that the memory target is initially set to low which causes unnecessary OOM situations.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: This may be related to the issue mentioned in #5598.