kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8.11k stars 3.98k forks source link

VPA: When targetRef is a Rollout, VerticalPodAutoscalerCheckpoint history is reset during deployment #6730

Open kodmaskinen opened 7 months ago

kodmaskinen commented 7 months ago

Which component are you using?: vertical-pod-autoscaler

What version of the component are you using?: Component version: 1.0.0

What k8s version are you using (kubectl version)?: 1.29.1

kubectl version Output
$ kubectl version
Client Version: v1.29.4
Server Version: v1.29.1-eks-508b6b3

What environment is this in?: EKS

What did you expect to happen?: I expect the VPA to retain the history from earlier versions of the same Rollout.

What happened instead?: VPA deletes the history from the VerticalPodAutoscalerCheckpoint during deployment of a new version using Argo Rollouts, which often means that the memory target is initially set to low which causes unnecessary OOM situations.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?: This may be related to the issue mentioned in #5598.

voelzmo commented 7 months ago

Yeah, so deleting the VPACheckpoints is 100% related to what I described in #5598:

Hope that explains it a bit.

In general, it seems that the way how Rollouts are designed, it is pretty incompatible how VPA works currently. I guess that's also one of the reasons, why e.g. knative doesn't have VPA support: it is pretty hard to integrate with the process of rolling out new versions by first creating the Pods and only later on switching and updating the selector.

voelzmo commented 7 months ago

/remove-kind bug /kind support

kodmaskinen commented 7 months ago

Thanks for the explanation!

It seems to me like it would work if VPA treated a Rollout more like a Deployment and used the .spec.Selector instead of the .status.Selector. It would, however, need to handle the case where a Rollout references a Deployment in .spec.workloadRef, and in that case get the selector from the .spec.Selector of the Deployment.

adrianmoisey commented 4 months ago

/area vertical-pod-autoscaler

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 3 weeks ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Shubham82 commented 1 week ago

/remove-lifecycle rotten

Shubham82 commented 1 week ago

Thanks, @voelzmo for the detailed explanation.

As @voelzmo explained the things, so @kodmaskinen if your concern is resolved can we close this issue?

kodmaskinen commented 1 week ago

The issue is still there and I believe that it could be fixed. However, my golang skills are lacking and I have only skimmed the source code, so there's a real possibility that I'm missing something.

As for our use-case, we are setting the minimum memory a lot higher than what we would have to if the recommendations were not "forgotten" during deployments (using Argo Rollouts) in order to avoid OOM. It's not ideal, but it's the best we can do right now.

Shubham82 commented 4 days ago

Thanks @kodmaskinen for the information.