kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
7.81k stars 3.87k forks source link

Vertical Pod Autoscaler is not recreating pods at runtime #6915

Open VivekPandeyDevOps opened 3 weeks ago

VivekPandeyDevOps commented 3 weeks ago

Which component are you using?: vertical-pod-autoscaler

What version of the component are you using?: v1.1.2

What k8s version are you using (kubectl version)?: Client Version: v1.29.3-eks-ae9a62a Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.29.4-eks-036c24b

What environment is this in?: AWS Elastic Kubernetes Service

What did you expect to happen?: Whenever, the load increases on the existing pods, that goes beyond the limit specified in the deployment spec., VPA should recommend new values and recreate the new pods.

What happened instead?: Whenever, the load increases on the existing pods, that goes beyond the limit specified in the deployment spec., VPA is recommending new value but recreation of the new pods is not happening.

How to reproduce it (as minimally and precisely as possible): Configured VPA with the below spec: resourcePolicy: containerPolicies:

Apply the VPA.yml and the recommended value were as below:

Recommendation: Container Recommendations: Container Name: ecmweb Lower Bound: Cpu: 200m Memory: 3Gi Target: Cpu: 763m Memory: 3Gi Uncapped Target: Cpu: 763m Memory: 12965919539 Upper Bound: Cpu: 1 Memory: 3Gi Events:

Updated the deployment spec as below: resources: requests: memory: 50Mi cpu: 1m limits: memory: 4096Mi cpu: 800m

Redeploy the containers and when described the newly created pod below was the allocated resources from VPA: kubectl describe po ecmweb-64574fc46d-hgmz7 -n newgen | grep cpu vpaUpdates: Pod resources updated by ecmweb: container 0: cpu request, memory request, cpu limit, memory limit cpu: 610400m cpu: 763m

kubectl describe po ecmweb-64574fc46d-hgmz7 -n newgen | grep memory vpaUpdates: Pod resources updated by ecmweb: container 0: cpu request, memory request, cpu limit, memory limit memory: 263882790666 memory: 3Gi

After that I generated the load using the Jmeter tool and observed the current resource utilization, at one point resource utilization was as below:


NAME     MODE       CPU    MEM   PROVIDED   AGE
ecmweb   Recreate   813m   3Gi   True       148m
ecmweb-64574fc46d-hgmz7                      1/1     Running   0          12m
ecmweb-64574fc46d-q4vsr                      1/1     Running   0          13m
ecmweb-64574fc46d-hgmz7                      1286m        10559Mi
ecmweb-64574fc46d-q4vsr                      1346m        10600Mi

but still, pods were not recreated. 

**Anything else we need to know?**:

<!--
Is there anything else you think we should know? Configuration of the component (be careful what you post here if so)? Relevant logs?
-->
Akuku25 commented 3 weeks ago

Faced the same issue as well. I set the Deployment with cpu of 500m. My expectation was that if I bombard the workloads with requests to a point where they require more cpu than the maximu allowed, the vpa was going to apply a new value recommended value and recreate the pod. In my case the recommended value for cpu was 564m.

What happened is that the pod was never recreated but the pods were running with CPU higher than the limit set without ever recreating this pod to apply new values

adrianmoisey commented 3 weeks ago

/area vertical-pod-autoscaler

adrianmoisey commented 3 weeks ago

Have you looked at the logs for the updater to see if it mentions why it doesn't evict the pod?

adrianmoisey commented 3 weeks ago

I setup a test, and after a short while the eviction did happen:

I0611 18:05:08.116565       1 update_priority_calculator.go:143] pod accepted for update default/hamster-7b87ffb764-mw7kq with priority 106.7
I0611 18:05:08.117329       1 update_priority_calculator.go:143] pod accepted for update default/hamster-7b87ffb764-67bvs with priority 106.7
I0611 18:05:08.117390       1 updater.go:220] evicting pod hamster-7b87ffb764-mw7kq
I0611 18:05:08.139078       1 event.go:298] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"hamster-7b87ffb764-mw7kq", UID:"c7192ca8-8d04-4fff-8733-4c5891452e18", APIVersion:"v1", ResourceVersion:"30413605", FieldPath:""}): type: 'Normal' reason: 'EvictedByVPA' Pod was evicted by VPA Updater to apply resource recommendation.
okanIz commented 3 weeks ago

I have exactly the same problem and also use the versions mentioned above k8s v1.29, VPA v1.1.2 (btw. it works for me with k8s v1.26 VPA v.1.1.2). In addition, I use the provided pod deployment "hamster" to check the functionalities and have found that the admissionontroller does not generate any requests. Strangely enough, the hamster pod restarts, but always with the same specs

Akuku25 commented 3 weeks ago

I setup a test, and after a short while the eviction did happen:

I0611 18:05:08.116565       1 update_priority_calculator.go:143] pod accepted for update default/hamster-7b87ffb764-mw7kq with priority 106.7
I0611 18:05:08.117329       1 update_priority_calculator.go:143] pod accepted for update default/hamster-7b87ffb764-67bvs with priority 106.7
I0611 18:05:08.117390       1 updater.go:220] evicting pod hamster-7b87ffb764-mw7kq
I0611 18:05:08.139078       1 event.go:298] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"hamster-7b87ffb764-mw7kq", UID:"c7192ca8-8d04-4fff-8733-4c5891452e18", APIVersion:"v1", ResourceVersion:"30413605", FieldPath:""}): type: 'Normal' reason: 'EvictedByVPA' Pod was evicted by VPA Updater to apply resource recommendation.

For the initial resource requests it is alway updating. However, when I increase the load and the cpu utilization required goes beyond the limit set in the deployment manifest, the pods do not get updated with the new recommended values but when I check resource utilization using the 'top pods' command, I can see that CPU utilization is beyond my set limit but there is no event for an update of any new resource requests on the pod

adrianmoisey commented 3 weeks ago

I have exactly the same problem and also use the versions mentioned above k8s v1.29, VPA v1.1.2 (btw. it works for me with k8s v1.26 VPA v.1.1.2). In addition, I use the provided pod deployment "hamster" to check the functionalities and have found that the admissionontroller does not generate any requests. Strangely enough, the hamster pod restarts, but always with the same specs

Can you provide an example VPA config that isn't working, specifically the targetRef part.

I know that there's a bug with the targetRef that if the kind isn't capitalised correctly, some parts of the VPA don't work.

For example, when I have kind: deployment, the admission-controller doesn't match the new Pod since it can't find the VPA:

I0612 17:44:00.005936       1 matcher.go:73] Let's choose from 1 configs for pod default/hamster-7b87ffb764-%
I0612 17:44:00.005981       1 handler.go:82] No matching VPA found for pod default/hamster-7b87ffb764-%
adrianmoisey commented 3 weeks ago

For the initial resource requests it is alway updating. However, when I increase the load and the cpu utilization required goes beyond the limit set in the deployment manifest, the pods do not get updated with the new recommended values but when I check resource utilization using the 'top pods' command, I can see that CPU utilization is beyond my set limit but there is no event for an update of any new resource requests on the pod

Please provide logs from the admission-controller