Closed NEO2756 closed 2 weeks ago
ok. I figured out that the issue is because of the `--min-replica=1' parameter passed to vpa-updater as suggested here The moment I remove this option I start seeing the vpa-updater logs and pod evictions. But I can not see the re-created pod has recommended cpu/mem applied. vpa-admission-controller doesn't have logs related to re-creation of pods. Below are the logs. Can someone help me what I am missing ?
I0912 07:48:39.996753 1 flags.go:57] FLAG: --webhook-port=""
I0912 07:48:39.996767 1 flags.go:57] FLAG: --webhook-service="vpa-webhook"
I0912 07:48:39.996787 1 flags.go:57] FLAG: --webhook-timeout-seconds="30"
I0912 07:48:39.996804 1 main.go:87] Vertical Pod Autoscaler 1.2.1 Admission Controller
I0912 07:48:39.997370 1 reflector.go:289] Starting reflector *v1.VerticalPodAutoscaler (1h0m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:90
I0912 07:48:39.997403 1 reflector.go:325] Listing and watching *v1.VerticalPodAutoscaler from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:90
I0912 07:48:40.097670 1 shared_informer.go:341] caches populated
I0912 07:48:40.097707 1 api.go:94] Initial VPA synced successfully
I0912 07:48:40.098890 1 reflector.go:289] Starting reflector *v1.ReplicationController (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.098930 1 reflector.go:325] Listing and watching *v1.ReplicationController from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.199497 1 shared_informer.go:341] caches populated
I0912 07:48:40.199546 1 fetcher.go:99] Initial sync of ReplicationController completed
I0912 07:48:40.199775 1 reflector.go:289] Starting reflector *v1.Job (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.199804 1 reflector.go:325] Listing and watching *v1.Job from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.300089 1 shared_informer.go:341] caches populated
I0912 07:48:40.300124 1 fetcher.go:99] Initial sync of Job completed
I0912 07:48:40.300349 1 reflector.go:289] Starting reflector *v1.CronJob (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.300372 1 reflector.go:325] Listing and watching *v1.CronJob from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.401157 1 shared_informer.go:341] caches populated
I0912 07:48:40.401221 1 fetcher.go:99] Initial sync of CronJob completed
I0912 07:48:40.401442 1 reflector.go:289] Starting reflector *v1.DaemonSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.401473 1 reflector.go:325] Listing and watching *v1.DaemonSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.501545 1 shared_informer.go:341] caches populated
I0912 07:48:40.501577 1 fetcher.go:99] Initial sync of DaemonSet completed
I0912 07:48:40.501883 1 reflector.go:289] Starting reflector *v1.Deployment (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.501911 1 reflector.go:325] Listing and watching *v1.Deployment from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.602382 1 shared_informer.go:341] caches populated
I0912 07:48:40.602416 1 fetcher.go:99] Initial sync of Deployment completed
I0912 07:48:40.602661 1 reflector.go:289] Starting reflector *v1.ReplicaSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.602688 1 reflector.go:325] Listing and watching *v1.ReplicaSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.803359 1 shared_informer.go:341] caches populated
I0912 07:48:40.803397 1 fetcher.go:99] Initial sync of ReplicaSet completed
I0912 07:48:40.803615 1 reflector.go:289] Starting reflector *v1.StatefulSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.803645 1 reflector.go:325] Listing and watching *v1.StatefulSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0912 07:48:40.903789 1 shared_informer.go:341] caches populated
I0912 07:48:40.903820 1 fetcher.go:99] Initial sync of StatefulSet completed
I0912 07:48:40.904051 1 shared_informer.go:341] caches populated
I0912 07:48:40.904073 1 controller_fetcher.go:141] Initial sync of CronJob completed
I0912 07:48:40.904110 1 shared_informer.go:341] caches populated
I0912 07:48:40.904118 1 controller_fetcher.go:141] Initial sync of DaemonSet completed
I0912 07:48:40.904126 1 shared_informer.go:341] caches populated
I0912 07:48:40.904135 1 controller_fetcher.go:141] Initial sync of Deployment completed
I0912 07:48:40.904143 1 shared_informer.go:341] caches populated
I0912 07:48:40.904149 1 controller_fetcher.go:141] Initial sync of ReplicaSet completed
I0912 07:48:40.904156 1 shared_informer.go:341] caches populated
W0912 07:48:40.904172 1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
I0912 07:48:40.904185 1 controller_fetcher.go:141] Initial sync of StatefulSet completed
W0912 07:48:40.904227 1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
I0912 07:48:40.904230 1 shared_informer.go:341] caches populated
W0912 07:48:40.904235 1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
I0912 07:48:40.904241 1 controller_fetcher.go:141] Initial sync of ReplicationController completed
W0912 07:48:40.904254 1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W0912 07:48:40.904242 1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
I0912 07:48:40.904263 1 shared_informer.go:341] caches populated
W0912 07:48:40.904267 1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
I0912 07:48:40.904272 1 controller_fetcher.go:141] Initial sync of Job completed
W0912 07:48:40.904311 1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
I0912 07:48:40.904556 1 reflector.go:289] Starting reflector *v1.LimitRange (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/limitrange/limit_range_calculator.go:60
I0912 07:48:40.904579 1 reflector.go:325] Listing and watching *v1.LimitRange from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/limitrange/limit_range_calculator.go:60
I0912 07:48:41.005201 1 shared_informer.go:341] caches populated
I0912 07:48:41.005792 1 certs.go:41] Successfully read 1168 bytes from /etc/tls-certs/caCert.pem
I0912 07:48:51.036028 1 config.go:174] Self registration as MutatingWebhook succeeded.
I0912 07:54:14.808763 1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.StatefulSet total 6 items received
I0912 07:54:41.008045 1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:90: Watch close - *v1.VerticalPodAutoscaler total 19 items received
I0912 07:54:47.416184 1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.DaemonSet total 70 items received
I0912 07:55:03.305055 1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.CronJob total 8 items received
I0912 07:55:52.597455 1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.Deployment total 30 items received
I0912 07:56:14.105981 1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.ReplicationController total 8 items received
I0912 07:57:10.908719 1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/limitrange/limit_range_calculator.go:60: Watch close - *v1.LimitRange total 10 items received
Seems, k8s api server is not able to reach to admission-controller as I can see below logs. Update: I am running on EKS-1.29 / VPA 1.2.1
2024-09-12T09:56:41.000Z
W0912 09:56:41.635673 10 dispatcher.go:210] Failed calling webhook, failing open vpa.k8s.io: failed calling webhook "vpa.k8s.io": failed to call webhook: Post "https://vpa-webhook.kube-system.svc:443/?timeout=30s": Address is not allowed
Note: There is no service mesh/istio bind to the namespace. Any help on this ?
As I was running on EKS with custom CNI (calico), I had to update hostNetwork: true
in vpa-admission-controller pod.
https://release-next--cert-manager.netlify.app/docs/installation/compatibility/#aws-eks led me to find this approach.
Q1: hamster is deployed in Auto mode. Why it is not getting evicted ? No new pod since its creation.
NAME CPU(cores) MEMORY(bytes)
hamster-c6967774f-nf9zf 516m 39Mi
Q2. There are no logs in updater after below logs ? Is this expected ?Below is my setup:
Deployed the VPA. I can see :-
Deployed hamster in "Auto mode" see below yamls apiVersion: "autoscaling.k8s.io/v1" kind: VerticalPodAutoscaler metadata: name: hamster-vpa spec: targetRef: apiVersion: "apps/v1" kind: Deployment name: hamster updatePolicy: updateMode: "Auto" resourcePolicy: containerPolicies:
I can see recommendation :- Recommendation: Container Recommendations: Container Name: hamster Lower Bound: Cpu: 556m Memory: 131072k Target: Cpu: 587m Memory: 131072k Uncapped Target: Cpu: 587m Memory: 131072k Upper Bound: Cpu: 1 Memory: 323756442