VPA: When targetRef is a Rollout, VerticalPodAutoscalerCheckpoint history is reset during deployment

kodmaskinen commented 7 months ago

Which component are you using?: vertical-pod-autoscaler

What version of the component are you using?: Component version: 1.0.0

What k8s version are you using (kubectl version)?: 1.29.1

kubectl version Output

$ kubectl version
Client Version: v1.29.4
Server Version: v1.29.1-eks-508b6b3

What environment is this in?: EKS

What did you expect to happen?: I expect the VPA to retain the history from earlier versions of the same Rollout.

What happened instead?: VPA deletes the history from the VerticalPodAutoscalerCheckpoint during deployment of a new version using Argo Rollouts, which often means that the memory target is initially set to low which causes unnecessary OOM situations.

How to reproduce it (as minimally and precisely as possible):

Deploy Argo Rollouts in cluster.
Create a Rollout object (using blue/green strategy).
Create a VPA object referencing the Rollout.
Wait a while until VerticalPodAutoscalerCheckpoint is populated.
Update the rollout and promote (if not using auto-promotion).
When rollout is finished, VerticalPodAutoscalerCheckpoint history is reset/deleted.

Anything else we need to know?: This may be related to the issue mentioned in #5598.

voelzmo commented 7 months ago

Yeah, so deleting the VPACheckpoints is 100% related to what I described in #5598:

The Pods for the new version in a Rollout are created before the Selector is changed to match them. As pointed out before, Rollout works by only updating the Selector to match the new Pods after promoting the new version
The VPA object points to the Rollout object, therefore, the new Pods are not determined to be under control of the VPA
- Note: as I described in #5598, this is the reason why Pods for the new version don't get the same request values assigned as the old version has. You would need to build some additional mechanism which gets the current Pod requests (or recommendations from the VPA Status) and puts them on the new version before rolling in the update. Most likely this is the cause of the OOMKills you're seeing?
Checkpoints are maintained for Pods under VPA control by creating an individual VPACheckpoint per VPA-Container pair
VPA recognizes those new Pods, doesn't find them to be under control of a VPA, does not create a checkpoint yet
Internally, the recommender gathers usage metric samples for the new Containers and creates a new Aggregation
When an Aggregation is created the first time, it is checked if it matches an existing VPA and if so, adds it to the VPA model
For a Rollout, this check is false, in contrast to e.g. a Deployment, where the selectors are updated when the new ReplicaSet is created.
Once the Rollout Selector has been switched, VPACheckpoints are created, but only from the new Aggregations. They're never merged with the existing Aggregations

Hope that explains it a bit.

In general, it seems that the way how Rollouts are designed, it is pretty incompatible how VPA works currently. I guess that's also one of the reasons, why e.g. knative doesn't have VPA support: it is pretty hard to integrate with the process of rolling out new versions by first creating the Pods and only later on switching and updating the selector.

voelzmo commented 7 months ago

/remove-kind bug /kind support

kodmaskinen commented 7 months ago

Thanks for the explanation!

It seems to me like it would work if VPA treated a Rollout more like a Deployment and used the .spec.Selector instead of the .status.Selector. It would, however, need to handle the case where a Rollout references a Deployment in .spec.workloadRef, and in that case get the selector from the .spec.Selector of the Deployment.

adrianmoisey commented 4 months ago

/area vertical-pod-autoscaler

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 3 weeks ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Shubham82 commented 1 week ago

/remove-lifecycle rotten

Shubham82 commented 1 week ago

Thanks, @voelzmo for the detailed explanation.

As @voelzmo explained the things, so @kodmaskinen if your concern is resolved can we close this issue?

kodmaskinen commented 1 week ago

The issue is still there and I believe that it could be fixed. However, my golang skills are lacking and I have only skimmed the source code, so there's a real possibility that I'm missing something.

As for our use-case, we are setting the minimum memory a lot higher than what we would have to if the recommendations were not "forgotten" during deployments (using Argo Rollouts) in order to avoid OOM. It's not ideal, but it's the best we can do right now.

Shubham82 commented 4 days ago

Thanks @kodmaskinen for the information.

kubernetes / autoscaler

VPA: When targetRef is a Rollout, VerticalPodAutoscalerCheckpoint history is reset during deployment #6730