kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
7.99k stars 3.94k forks source link

[VPA] Hamster pod is not getting evicted. #7274

Closed NEO2756 closed 2 weeks ago

NEO2756 commented 2 weeks ago

Q1: hamster is deployed in Auto mode. Why it is not getting evicted ? No new pod since its creation. NAME CPU(cores) MEMORY(bytes) hamster-c6967774f-nf9zf 516m 39Mi Q2. There are no logs in updater after below logs ? Is this expected ?

W0912 05:39:10.296051       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W0912 05:39:10.296030       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W0912 05:39:10.296067       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W0912 05:39:10.296099       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
I0912 05:39:10.998390       1 api.go:94] Initial VPA synced successfully

Below is my setup:

  1. Deployed the VPA. I can see :-

    
    vpa-admission-controller-7b957fd67c-tvrxg     1/1     Running   0          4m39s
    vpa-recommender-7879576855-7wbpn            1/1     Running   0          14m
    vpa-updater-7849f8fb48-7xvnv                          1/1     Running   0          4m39s```
  2. Deployed hamster in "Auto mode" see below yamls apiVersion: "autoscaling.k8s.io/v1" kind: VerticalPodAutoscaler metadata: name: hamster-vpa spec: targetRef: apiVersion: "apps/v1" kind: Deployment name: hamster updatePolicy: updateMode: "Auto" resourcePolicy: containerPolicies:

    • containerName: '*' minAllowed: cpu: 100m memory: 50Mi maxAllowed: cpu: 1 memory: 500Mi controlledResources: ["cpu", "memory"]
  3. I can see recommendation :- Recommendation: Container Recommendations: Container Name: hamster Lower Bound: Cpu: 556m Memory: 131072k Target: Cpu: 587m Memory: 131072k Uncapped Target: Cpu: 587m Memory: 131072k Upper Bound: Cpu: 1 Memory: 323756442

NEO2756 commented 2 weeks ago

ok. I figured out that the issue is because of the `--min-replica=1' parameter passed to vpa-updater as suggested here The moment I remove this option I start seeing the vpa-updater logs and pod evictions. But I can not see the re-created pod has recommended cpu/mem applied. vpa-admission-controller doesn't have logs related to re-creation of pods. Below are the logs. Can someone help me what I am missing ?

I0912 07:48:39.996753       1 flags.go:57] FLAG: --webhook-port=""
I0912 07:48:39.996767       1 flags.go:57] FLAG: --webhook-service="vpa-webhook"
I0912 07:48:39.996787       1 flags.go:57] FLAG: --webhook-timeout-seconds="30"
I0912 07:48:39.996804       1 main.go:87] Vertical Pod Autoscaler 1.2.1 Admission Controller
I0912 07:48:39.997370       1 reflector.go:289] Starting reflector *v1.VerticalPodAutoscaler (1h0m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:90
I0912 07:48:39.997403       1 reflector.go:325] Listing and watching *v1.VerticalPodAutoscaler from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:90
I0912 07:48:40.097670       1 shared_informer.go:341] caches populated
I0912 07:48:40.097707       1 api.go:94] Initial VPA synced successfully
I0912 07:48:40.098890       1 reflector.go:289] Starting reflector *v1.ReplicationController (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.098930       1 reflector.go:325] Listing and watching *v1.ReplicationController from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.199497       1 shared_informer.go:341] caches populated
I0912 07:48:40.199546       1 fetcher.go:99] Initial sync of ReplicationController completed
I0912 07:48:40.199775       1 reflector.go:289] Starting reflector *v1.Job (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.199804       1 reflector.go:325] Listing and watching *v1.Job from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.300089       1 shared_informer.go:341] caches populated
I0912 07:48:40.300124       1 fetcher.go:99] Initial sync of Job completed
I0912 07:48:40.300349       1 reflector.go:289] Starting reflector *v1.CronJob (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.300372       1 reflector.go:325] Listing and watching *v1.CronJob from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.401157       1 shared_informer.go:341] caches populated
I0912 07:48:40.401221       1 fetcher.go:99] Initial sync of CronJob completed
I0912 07:48:40.401442       1 reflector.go:289] Starting reflector *v1.DaemonSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.401473       1 reflector.go:325] Listing and watching *v1.DaemonSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.501545       1 shared_informer.go:341] caches populated
I0912 07:48:40.501577       1 fetcher.go:99] Initial sync of DaemonSet completed
I0912 07:48:40.501883       1 reflector.go:289] Starting reflector *v1.Deployment (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.501911       1 reflector.go:325] Listing and watching *v1.Deployment from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.602382       1 shared_informer.go:341] caches populated
I0912 07:48:40.602416       1 fetcher.go:99] Initial sync of Deployment completed
I0912 07:48:40.602661       1 reflector.go:289] Starting reflector *v1.ReplicaSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.602688       1 reflector.go:325] Listing and watching *v1.ReplicaSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.803359       1 shared_informer.go:341] caches populated
I0912 07:48:40.803397       1 fetcher.go:99] Initial sync of ReplicaSet completed
I0912 07:48:40.803615       1 reflector.go:289] Starting reflector *v1.StatefulSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.803645       1 reflector.go:325] Listing and watching *v1.StatefulSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.903789       1 shared_informer.go:341] caches populated
I0912 07:48:40.903820       1 fetcher.go:99] Initial sync of StatefulSet completed
I0912 07:48:40.904051       1 shared_informer.go:341] caches populated
I0912 07:48:40.904073       1 controller_fetcher.go:141] Initial sync of CronJob completed
I0912 07:48:40.904110       1 shared_informer.go:341] caches populated
I0912 07:48:40.904118       1 controller_fetcher.go:141] Initial sync of DaemonSet completed
I0912 07:48:40.904126       1 shared_informer.go:341] caches populated
I0912 07:48:40.904135       1 controller_fetcher.go:141] Initial sync of Deployment completed
I0912 07:48:40.904143       1 shared_informer.go:341] caches populated
I0912 07:48:40.904149       1 controller_fetcher.go:141] Initial sync of ReplicaSet completed
I0912 07:48:40.904156       1 shared_informer.go:341] caches populated
W0912 07:48:40.904172       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
I0912 07:48:40.904185       1 controller_fetcher.go:141] Initial sync of StatefulSet completed
W0912 07:48:40.904227       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
I0912 07:48:40.904230       1 shared_informer.go:341] caches populated
W0912 07:48:40.904235       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
I0912 07:48:40.904241       1 controller_fetcher.go:141] Initial sync of ReplicationController completed
W0912 07:48:40.904254       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W0912 07:48:40.904242       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
I0912 07:48:40.904263       1 shared_informer.go:341] caches populated
W0912 07:48:40.904267       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
I0912 07:48:40.904272       1 controller_fetcher.go:141] Initial sync of Job completed
W0912 07:48:40.904311       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
I0912 07:48:40.904556       1 reflector.go:289] Starting reflector *v1.LimitRange (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/limitrange/limit_range_calculator.go:60
I0912 07:48:40.904579       1 reflector.go:325] Listing and watching *v1.LimitRange from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/limitrange/limit_range_calculator.go:60
I0912 07:48:41.005201       1 shared_informer.go:341] caches populated
I0912 07:48:41.005792       1 certs.go:41] Successfully read 1168 bytes from /etc/tls-certs/caCert.pem
I0912 07:48:51.036028       1 config.go:174] Self registration as MutatingWebhook succeeded.
I0912 07:54:14.808763       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.StatefulSet total 6 items received
I0912 07:54:41.008045       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:90: Watch close - *v1.VerticalPodAutoscaler total 19 items received
I0912 07:54:47.416184       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.DaemonSet total 70 items received
I0912 07:55:03.305055       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.CronJob total 8 items received
I0912 07:55:52.597455       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.Deployment total 30 items received
I0912 07:56:14.105981       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.ReplicationController total 8 items received
I0912 07:57:10.908719       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/limitrange/limit_range_calculator.go:60: Watch close - *v1.LimitRange total 10 items received
NEO2756 commented 2 weeks ago

Seems, k8s api server is not able to reach to admission-controller as I can see below logs. Update: I am running on EKS-1.29 / VPA 1.2.1

2024-09-12T09:56:41.000Z
W0912 09:56:41.635673      10 dispatcher.go:210] Failed calling webhook, failing open vpa.k8s.io: failed calling webhook "vpa.k8s.io": failed to call webhook: Post "https://vpa-webhook.kube-system.svc:443/?timeout=30s": Address is not allowed

Note: There is no service mesh/istio bind to the namespace. Any help on this ?

NEO2756 commented 2 weeks ago

As I was running on EKS with custom CNI (calico), I had to update hostNetwork: true in vpa-admission-controller pod. https://release-next--cert-manager.netlify.app/docs/installation/compatibility/#aws-eks led me to find this approach.