kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8.08k stars 3.97k forks source link

VPA restarts pods but does not modify CPU or memory settings #5792

Closed brsolomon-deloitte closed 1 year ago

brsolomon-deloitte commented 1 year ago

Which component are you using?:

vertical-pod-autoscaler

What version of the component are you using?:

Component version: 0.13.0

What k8s version are you using (kubectl version)?:

$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.4", GitCommit:"f89670c3aa4059d6999cb42e23ccb4f0b9a03979", GitTreeState:"clean", BuildDate:"2023-04-12T12:13:53Z", GoVersion:"go1.19.8", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26+", GitVersion:"v1.26.4-eks-0a21954", GitCommit:"4a3479673cb6d9b63f1c69a67b57de30a4d9b781", GitTreeState:"clean", BuildDate:"2023-04-15T00:33:09Z", GoVersion:"go1.19.8", Compiler:"gc", Platform:"linux/amd64"}

What did you expect to happen?:

VPA should not prevent other applications from deploying completely.

What happened instead?:

As detailed in https://github.com/orgs/strimzi/discussions/8533, Pods failed to come up entirely because an Operator was unable to successfully request/created them from k8s apiserver.

How to reproduce it (as minimally and precisely as possible):

Deploy VPA from Helm chart. Chart version https://cowboysysop.github.io/charts/ -> 7.0.1, VPA version 0.13.0.

Deploy Strimzi 0.35 from Helm chart.

Deploy a Kafka cluster:

$ helm template . -s templates/kafka-cluster.yaml
---
# Source: kafka/templates/kafka-cluster.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: redacted
  labels:
    app.kubernetes.io/name: kafka
    helm.sh/chart: kafka-3.0.0
    app.kubernetes.io/instance: release-name
    app.kubernetes.io/managed-by: Helm
spec:
  kafka:
    version: 3.4.0
    replicas: 3
    listeners:
      - name: internal
        port: 9092
        type: internal
        tls: false
        authentication:
          type: scram-sha-512
      - name: external
        port: 9094
        type: nodeport
        tls: false
        authentication:
          type: scram-sha-512
        configuration:
          bootstrap:
            nodePort: 31094
          preferredNodePortAddressType: InternalIP
    config:
      auto.create.topics.enable: "false"
      default.replication.factor: 2
      log.retention.hours: 24
      num.partitions: 3
      inter.broker.protocol.version: "3.4"
    storage:
      type: persistent-claim
      size: "32Gi"
      deleteClaim: true
    readinessProbe:
      initialDelaySeconds: 30
    livenessProbe:
      initialDelaySeconds: 60
  zookeeper:
    replicas: 3
    storage:
      type: ephemeral
  entityOperator:
    topicOperator: {}
    userOperator: {}
  cruiseControl: {}

Watch things break:

2023-05-19T12:22:53.753469099Z 2023-05-19 12:22:53 ERROR StrimziPodSetController:344 - Reconciliation #5(watch) StrimziPodSet(elastic/REDACTED-zookeeper): StrimziPodSet REDACTED-zookeeper in namespace elastic reconciliation failed
2023-05-19T12:22:53.753750631Z io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
2023-05-19T12:22:53.753786272Z  at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129) ~[io.fabric8.kubernetes-client-api-6.6.1.jar:?]
2023-05-19T12:22:53.753798971Z  at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122) ~[io.fabric8.kubernetes-client-api-6.6.1.jar:?]
2023-05-19T12:22:53.753804077Z  at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44) ~[io.fabric8.kubernetes-client-6.6.1.jar:?]
2023-05-19T12:22:53.753823613Z  at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1112) ~[io.fabric8.kubernetes-client-6.6.1.jar:?]
2023-05-19T12:22:53.753827656Z  at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93) ~[io.fabric8.kubernetes-client-6.6.1.jar:?]
2023-05-19T12:22:53.753831623Z  at io.strimzi.operator.cluster.operator.assembly.StrimziPodSetController.maybeCreateOrPatchPod(StrimziPodSetController.java:405) ~[io.strimzi.cluster-operator-0.35.0.jar:0.35.0]
2023-05-19T12:22:53.753835144Z  at io.strimzi.operator.cluster.operator.assembly.StrimziPodSetController.reconcile(StrimziPodSetController.java:333) ~[io.strimzi.cluster-operator-0.35.0.jar:0.35.0]
2023-05-19T12:22:53.753838910Z  at io.strimzi.operator.cluster.operator.assembly.StrimziPodSetController.run(StrimziPodSetController.java:501) ~[io.strimzi.cluster-operator-0.35.0.jar:0.35.0]
2023-05-19T12:22:53.753856754Z  at java.lang.Thread.run(Thread.java:833) ~[?:?]
2023-05-19T12:22:53.753861384Z Caused by: java.io.IOException: request timed out
2023-05-19T12:22:53.753865030Z  at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:514) ~[io.fabric8.kubernetes-client-6.6.1.jar:?]
2023-05-19T12:22:53.753868646Z  at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:534) ~[io.fabric8.kubernetes-client-6.6.1.jar:?]
2023-05-19T12:22:53.753872052Z  at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:342) ~[io.fabric8.kubernetes-client-6.6.1.jar:?]
2023-05-19T12:22:53.753875410Z  at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:707) ~[io.fabric8.kubernetes-client-6.6.1.jar:?]
2023-05-19T12:22:53.753878665Z  at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:93) ~[io.fabric8.kubernetes-client-6.6.1.jar:?]
2023-05-19T12:22:53.753882235Z  at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:42) ~[io.fabric8.kubernetes-client-6.6.1.jar:?]
2023-05-19T12:22:53.753901411Z  ... 6 more
2023-05-19T12:22:53.753906552Z Caused by: java.net.http.HttpTimeoutException: request timed out
2023-05-19T12:22:53.753910829Z  at jdk.internal.net.http.ResponseTimerEvent.handle(ResponseTimerEvent.java:63) ~[java.net.http:?]
2023-05-19T12:22:53.753915103Z  at jdk.internal.net.http.HttpClientImpl.purgeTimeoutsAndReturnNextDeadline(HttpClientImpl.java:1270) ~[java.net.http:?]
2023-05-19T12:22:53.753919527Z  at jdk.internal.net.http.HttpClientImpl$SelectorManager.run(HttpClientImpl.java:899) ~[java.net.http:?]
2023-05-19T12:22:53.758942026Z 2023-05-19 12:22:53 INFO  StrimziPodSetController:349 - Reconciliation #5(watch) StrimziPodSet(elastic/REDACTED-zookeeper): reconciled
brsolomon-deloitte commented 1 year ago

Logs for the admission controller don't say very much:

$ kubectl logs --tail=500 --follow -n kube-system deploy/vertical-pod-autoscaler-admission-controller
I0522 18:08:22.942158       1 flags.go:57] FLAG: --add-dir-header="false"
I0522 18:08:22.942205       1 flags.go:57] FLAG: --address=":8944"
I0522 18:08:22.942211       1 flags.go:57] FLAG: --alsologtostderr="false"
I0522 18:08:22.942215       1 flags.go:57] FLAG: --client-ca-file="/tls-secret/ca.crt"
I0522 18:08:22.942219       1 flags.go:57] FLAG: --kube-api-burst="10"
I0522 18:08:22.942224       1 flags.go:57] FLAG: --kube-api-qps="5"
I0522 18:08:22.942227       1 flags.go:57] FLAG: --kubeconfig=""
I0522 18:08:22.942231       1 flags.go:57] FLAG: --log-backtrace-at=":0"
I0522 18:08:22.942238       1 flags.go:57] FLAG: --log-dir=""
I0522 18:08:22.942242       1 flags.go:57] FLAG: --log-file=""
I0522 18:08:22.942246       1 flags.go:57] FLAG: --log-file-max-size="1800"
I0522 18:08:22.942251       1 flags.go:57] FLAG: --logtostderr="true"
I0522 18:08:22.942255       1 flags.go:57] FLAG: --one-output="false"
I0522 18:08:22.942260       1 flags.go:57] FLAG: --port="8000"
I0522 18:08:22.942265       1 flags.go:57] FLAG: --register-by-url="false"
I0522 18:08:22.942270       1 flags.go:57] FLAG: --register-webhook="true"
I0522 18:08:22.942275       1 flags.go:57] FLAG: --skip-headers="false"
I0522 18:08:22.942279       1 flags.go:57] FLAG: --skip-log-headers="false"
I0522 18:08:22.942283       1 flags.go:57] FLAG: --stderrthreshold="2"
I0522 18:08:22.942293       1 flags.go:57] FLAG: --tls-cert-file="/tls-secret/tls.crt"
I0522 18:08:22.942298       1 flags.go:57] FLAG: --tls-private-key="/tls-secret/tls.key"
I0522 18:08:22.942302       1 flags.go:57] FLAG: --v="2"
I0522 18:08:22.942307       1 flags.go:57] FLAG: --vmodule=""
I0522 18:08:22.942311       1 flags.go:57] FLAG: --vpa-object-namespace=""
I0522 18:08:22.942315       1 flags.go:57] FLAG: --webhook-address=""
I0522 18:08:22.942319       1 flags.go:57] FLAG: --webhook-port=""
I0522 18:08:22.942322       1 flags.go:57] FLAG: --webhook-service="vpa-webhook"
I0522 18:08:22.942327       1 flags.go:57] FLAG: --webhook-timeout-seconds="30"
I0522 18:08:22.942332       1 main.go:76] Vertical Pod Autoscaler 0.13.0 Admission Controller
I0522 18:08:23.043477       1 api.go:92] Initial VPA synced successfully
I0522 18:08:23.144677       1 fetcher.go:99] Initial sync of DaemonSet completed
I0522 18:08:23.245543       1 fetcher.go:99] Initial sync of Deployment completed
I0522 18:08:23.346043       1 fetcher.go:99] Initial sync of ReplicaSet completed
I0522 18:08:23.446284       1 fetcher.go:99] Initial sync of StatefulSet completed
I0522 18:08:23.546422       1 fetcher.go:99] Initial sync of ReplicationController completed
I0522 18:08:23.647520       1 fetcher.go:99] Initial sync of Job completed
I0522 18:08:23.748261       1 fetcher.go:99] Initial sync of CronJob completed
W0522 18:08:23.748377       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
W0522 18:08:23.748449       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
W0522 18:08:23.748450       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
W0522 18:08:23.748506       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
W0522 18:08:23.748573       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
W0522 18:08:23.748600       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
W0522 18:08:23.748703       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
brsolomon-deloitte commented 1 year ago
$ kubectl get --all-namespaces MutatingWebhookConfiguration,ValidatingWebhookConfiguration
NAME                                                                                      WEBHOOKS   AGE
mutatingwebhookconfiguration.admissionregistration.k8s.io/aws-load-balancer-webhook       3          10d
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook            1          10d
mutatingwebhookconfiguration.admissionregistration.k8s.io/pod-identity-webhook            1          10d
mutatingwebhookconfiguration.admissionregistration.k8s.io/vpa-webhook-config              1          19m
mutatingwebhookconfiguration.admissionregistration.k8s.io/vpc-resource-mutating-webhook   1          10d

NAME                                                                                                    WEBHOOKS   AGE
validatingwebhookconfiguration.admissionregistration.k8s.io/aws-load-balancer-webhook                   3          10d
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook                        1          10d
validatingwebhookconfiguration.admissionregistration.k8s.io/eks-aws-auth-configmap-validation-webhook   1          10d
validatingwebhookconfiguration.admissionregistration.k8s.io/externalsecret-validate                     1          10d
validatingwebhookconfiguration.admissionregistration.k8s.io/ingress-nginx-admission                     1          10d
validatingwebhookconfiguration.admissionregistration.k8s.io/secretstore-validate                        2          10d
validatingwebhookconfiguration.admissionregistration.k8s.io/vpc-resource-validating-webhook             2          10d
brsolomon-deloitte commented 1 year ago

The frustrating thing about VPA here is that _even at --v="4", VPA admission controller never actually logs what it doesn't like about the pods to be admitted.

voelzmo commented 1 year ago

Hey @brsolomon-deloitte, this looks like something is wrong with how the VPA ist setup – most likely not related to strimzi or kafka directly. I'd assume that not even the VPA example is working correctly in your cluster: you probably don't see any resource updates for the Hamster Pods.

Given that you don't even see this log statement when you enable --v=4, I assume that something is wrong with the network, such that the requests for Pod admission don't even reach the admission-controller. Maybe you have network policies in place that deny the traffic from kube-apiserver to admission-controller and vice-versa?

brsolomon-deloitte commented 1 year ago

Hey @brsolomon-deloitte, this looks like something is wrong with how the VPA ist setup – most likely not related to strimzi or kafka directly. I'd assume that not even the VPA example is working correctly in your cluster: you probably don't see any resource updates for the Hamster Pods.

Given that you don't even see this log statement when you enable --v=4, I assume that something is wrong with the network, such that the requests for Pod admission don't even reach the admission-controller. Maybe you have network policies in place that deny the traffic from kube-apiserver to admission-controller and vice-versa?

Here is a full reproducible of what I am seeing with the test VPA example.

$ kubectl get -n kube-system -o yaml deploy/hamster
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"argocd.argoproj.io/instance":"vertical-pod-autoscaler"},"name":"hamster","namespace":"kube-system"},"spec":{"replicas":2,"selector":{"matchLabels":{"app":"hamster"}},"template":{"metadata":{"labels":{"app":"hamster"}},"spec":{"containers":[{"args":["-c","while true; do timeout 0.5s yes \u003e/dev/null; sleep 0.5s; done"],"command":["/bin/sh"],"image":"bitnami/bitnami-shell:11-debian-11-r113","name":"hamster","resources":{"requests":{"cpu":"100m","memory":"50Mi"}}}],"securityContext":{"runAsNonRoot":true,"runAsUser":1001}}}}}
  creationTimestamp: "2023-05-30T12:05:58Z"
  generation: 1
  labels:
    argocd.argoproj.io/instance: vertical-pod-autoscaler
  name: hamster
  namespace: kube-system
  resourceVersion: "13234482"
  uid: 94c2b061-fadc-42ed-8cc8-9cd0681e46ba
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: hamster
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: hamster
    spec:
      containers:
      - args:
        - -c
        - while true; do timeout 0.5s yes >/dev/null; sleep 0.5s; done
        command:
        - /bin/sh
        image: bitnami/bitnami-shell:11-debian-11-r113
        imagePullPolicy: IfNotPresent
        name: hamster
        resources:
          requests:
            cpu: 100m
            memory: 50Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsNonRoot: true
        runAsUser: 1001
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 2
  conditions:
  - lastTransitionTime: "2023-05-30T12:05:58Z"
    lastUpdateTime: "2023-05-30T12:06:04Z"
    message: ReplicaSet "hamster-67b6b96f6c" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: "2023-05-30T12:22:36Z"
    lastUpdateTime: "2023-05-30T12:22:36Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 1
  readyReplicas: 2
  replicas: 2
  updatedReplicas: 2

$ kubectl get -n kube-system -o yaml vpa
apiVersion: v1
items:
- apiVersion: autoscaling.k8s.io/v1
  kind: VerticalPodAutoscaler
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"autoscaling.k8s.io/v1","kind":"VerticalPodAutoscaler","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"vertical-pod-autoscaler","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"vertical-pod-autoscaler","argocd.argoproj.io/instance":"vertical-pod-autoscaler","helm.sh/chart":"vertical-pod-autoscaler-0.3.0"},"name":"hamster-vpa","namespace":"kube-system"},"spec":{"resourcePolicy":{"containerPolicies":[{"containerName":"*","controlledResources":["cpu","memory"],"maxAllowed":{"cpu":1,"memory":"500Mi"},"minAllowed":{"cpu":"100m","memory":"50Mi"}}]},"targetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"hamster"}}}
    creationTimestamp: "2023-05-30T12:05:58Z"
    generation: 17
    labels:
      app.kubernetes.io/instance: vertical-pod-autoscaler
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: vertical-pod-autoscaler
      argocd.argoproj.io/instance: vertical-pod-autoscaler
      helm.sh/chart: vertical-pod-autoscaler-0.3.0
    name: hamster-vpa
    namespace: kube-system
    resourceVersion: "13234444"
    uid: ef19e11e-c8e9-403e-9d0c-c8a0bd9cf44b
  spec:
    resourcePolicy:
      containerPolicies:
      - containerName: '*'
        controlledResources:
        - cpu
        - memory
        maxAllowed:
          cpu: 1
          memory: 500Mi
        minAllowed:
          cpu: 100m
          memory: 50Mi
    targetRef:
      apiVersion: apps/v1
      kind: Deployment
      name: hamster
  status:
    conditions:
    - lastTransitionTime: "2023-05-30T12:07:04Z"
      status: "True"
      type: RecommendationProvided
    recommendation:
      containerRecommendations:
      - containerName: hamster
        lowerBound:
          cpu: 489m
          memory: 262144k
        target:
          cpu: 587m
          memory: 262144k
        uncappedTarget:
          cpu: 587m
          memory: 262144k
        upperBound:
          cpu: "1"
          memory: 500Mi
kind: List
metadata:
  resourceVersion: ""

Regarding,

Maybe you have network policies in place that deny the traffic from kube-apiserver to admission-controller and vice-versa?

Can you please clarify what specific rule that would be? I see that svc/vpa-webhook listens on 443. We have rules allowing kube-apiserver to communicate over the following:

When deploying the test deployment,

logs from deploy/vertical-pod-autoscaler-admission-controller:

I0530 12:06:03.286906       1 flags.go:57] FLAG: --kube-api-qps="5"
I0530 12:06:03.286911       1 flags.go:57] FLAG: --kubeconfig=""
I0530 12:06:03.286920       1 flags.go:57] FLAG: --log-backtrace-at=":0"
I0530 12:06:03.286931       1 flags.go:57] FLAG: --log-dir=""
I0530 12:06:03.286936       1 flags.go:57] FLAG: --log-file=""
I0530 12:06:03.286942       1 flags.go:57] FLAG: --log-file-max-size="1800"
I0530 12:06:03.286950       1 flags.go:57] FLAG: --logtostderr="true"
I0530 12:06:03.286954       1 flags.go:57] FLAG: --one-output="false"
I0530 12:06:03.286961       1 flags.go:57] FLAG: --port="8000"
I0530 12:06:03.286965       1 flags.go:57] FLAG: --register-by-url="false"
I0530 12:06:03.286969       1 flags.go:57] FLAG: --register-webhook="true"
I0530 12:06:03.286973       1 flags.go:57] FLAG: --skip-headers="false"
I0530 12:06:03.286978       1 flags.go:57] FLAG: --skip-log-headers="false"
I0530 12:06:03.286981       1 flags.go:57] FLAG: --stderrthreshold="2"
I0530 12:06:03.286986       1 flags.go:57] FLAG: --tls-cert-file="/tls-secret/tls.crt"
I0530 12:06:03.286990       1 flags.go:57] FLAG: --tls-private-key="/tls-secret/tls.key"
I0530 12:06:03.286997       1 flags.go:57] FLAG: --v="4"
I0530 12:06:03.287002       1 flags.go:57] FLAG: --vmodule=""
I0530 12:06:03.287006       1 flags.go:57] FLAG: --vpa-object-namespace=""
I0530 12:06:03.287014       1 flags.go:57] FLAG: --webhook-address=""
I0530 12:06:03.287019       1 flags.go:57] FLAG: --webhook-port=""
I0530 12:06:03.287023       1 flags.go:57] FLAG: --webhook-service="vpa-webhook"
I0530 12:06:03.287028       1 flags.go:57] FLAG: --webhook-timeout-seconds="30"
I0530 12:06:03.287038       1 main.go:76] Vertical Pod Autoscaler 0.13.0 Admission Controller
I0530 12:06:03.287109       1 certs.go:40] Successfully read 1224 bytes from /tls-secret/ca.crt
I0530 12:06:03.287133       1 certs.go:40] Successfully read 1257 bytes from /tls-secret/tls.crt
I0530 12:06:03.287153       1 certs.go:40] Successfully read 1679 bytes from /tls-secret/tls.key
I0530 12:06:03.287783       1 reflector.go:221] Starting reflector *v1.VerticalPodAutoscaler (1h0m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:88
I0530 12:06:03.287798       1 reflector.go:257] Listing and watching *v1.VerticalPodAutoscaler from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:88
I0530 12:06:03.390286       1 shared_informer.go:285] caches populated
I0530 12:06:03.390309       1 api.go:92] Initial VPA synced successfully
I0530 12:06:03.391059       1 reflector.go:221] Starting reflector *v1.Deployment (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0530 12:06:03.391075       1 reflector.go:257] Listing and watching *v1.Deployment from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0530 12:06:03.490916       1 shared_informer.go:285] caches populated
I0530 12:06:03.490939       1 fetcher.go:99] Initial sync of Deployment completed
I0530 12:06:03.491151       1 reflector.go:221] Starting reflector *v1.ReplicaSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0530 12:06:03.491162       1 reflector.go:257] Listing and watching *v1.ReplicaSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0530 12:06:03.591236       1 shared_informer.go:285] caches populated
I0530 12:06:03.591258       1 fetcher.go:99] Initial sync of ReplicaSet completed
I0530 12:06:03.591480       1 reflector.go:221] Starting reflector *v1.StatefulSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0530 12:06:03.591490       1 reflector.go:257] Listing and watching *v1.StatefulSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0530 12:06:03.691997       1 shared_informer.go:285] caches populated
I0530 12:06:03.692053       1 fetcher.go:99] Initial sync of StatefulSet completed
I0530 12:06:03.692300       1 reflector.go:221] Starting reflector *v1.ReplicationController (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0530 12:06:03.692311       1 reflector.go:257] Listing and watching *v1.ReplicationController from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0530 12:06:03.792672       1 shared_informer.go:285] caches populated
I0530 12:06:03.792741       1 fetcher.go:99] Initial sync of ReplicationController completed
I0530 12:06:03.792999       1 reflector.go:221] Starting reflector *v1.Job (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0530 12:06:03.793027       1 reflector.go:257] Listing and watching *v1.Job from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0530 12:06:03.896003       1 shared_informer.go:285] caches populated
I0530 12:06:03.896025       1 fetcher.go:99] Initial sync of Job completed
I0530 12:06:03.896215       1 reflector.go:221] Starting reflector *v1.CronJob (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0530 12:06:03.896224       1 reflector.go:257] Listing and watching *v1.CronJob from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0530 12:06:03.997158       1 shared_informer.go:285] caches populated
I0530 12:06:03.997181       1 fetcher.go:99] Initial sync of CronJob completed
I0530 12:06:03.997439       1 reflector.go:221] Starting reflector *v1.DaemonSet (10m0s) from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0530 12:06:03.997449       1 reflector.go:257] Listing and watching *v1.DaemonSet from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94
I0530 12:06:04.098159       1 shared_informer.go:285] caches populated
I0530 12:06:04.098205       1 fetcher.go:99] Initial sync of DaemonSet completed
I0530 12:06:04.098446       1 shared_informer.go:285] caches populated
I0530 12:06:04.098465       1 shared_informer.go:285] caches populated
W0530 12:06:04.098592       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
W0530 12:06:04.098612       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
I0530 12:06:04.098779       1 reflector.go:221] Starting reflector *v1.LimitRange (10m0s) from k8s.io/client-go/informers/factory.go:134
I0530 12:06:04.098973       1 reflector.go:257] Listing and watching *v1.LimitRange from k8s.io/client-go/informers/factory.go:134
W0530 12:06:04.098811       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
W0530 12:06:04.098835       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
W0530 12:06:04.098848       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
W0530 12:06:04.098861       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
W0530 12:06:04.098951       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
I0530 12:06:04.198782       1 shared_informer.go:285] caches populated
I0530 12:06:04.199100       1 shared_informer.go:285] caches populated
I0530 12:06:04.199239       1 shared_informer.go:285] caches populated
I0530 12:06:04.199382       1 shared_informer.go:285] caches populated
I0530 12:06:04.199477       1 shared_informer.go:285] caches populated
I0530 12:06:04.199594       1 shared_informer.go:285] caches populated
I0530 12:06:14.221873       1 config.go:103] Self registration as MutatingWebhook succeeded.
I0530 12:11:22.900816       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.CronJob total 6 items received
I0530 12:11:33.103816       1 reflector.go:559] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.LimitRange total 7 items received
I0530 12:11:50.007347       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.DaemonSet total 6 items received
I0530 12:13:10.700752       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.ReplicationController total 8 items received
I0530 12:13:14.599209       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.StatefulSet total 9 items received
I0530 12:14:04.304573       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:88: Watch close - *v1.VerticalPodAutoscaler total 16 items received
I0530 12:14:22.520120       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.ReplicaSet total 31 items received
I0530 12:14:29.798231       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.Job total 10 items received
I0530 12:15:45.420937       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.Deployment total 38 items received
I0530 12:17:52.903287       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.CronJob total 8 items received
I0530 12:19:07.108943       1 reflector.go:559] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.LimitRange total 8 items received
I0530 12:19:14.704147       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.ReplicationController total 7 items received
I0530 12:20:08.601527       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.StatefulSet total 7 items received
I0530 12:20:39.307565       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:88: Watch close - *v1.VerticalPodAutoscaler total 15 items received
I0530 12:20:53.800849       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.Job total 8 items received
I0530 12:20:54.011097       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.DaemonSet total 11 items received
I0530 12:21:42.523240       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.ReplicaSet total 32 items received
I0530 12:22:12.425405       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.Deployment total 25 items received
I0530 12:25:12.110957       1 reflector.go:559] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.LimitRange total 7 items received
I0530 12:25:14.705967       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.ReplicationController total 6 items received
I0530 12:26:15.905299       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.CronJob total 9 items received
I0530 12:26:56.603967       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.StatefulSet total 8 items received
I0530 12:27:21.012783       1 reflector.go:559] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.DaemonSet total 7 items received

logs from deploy/vertical-pod-autoscaler-updater:

I0530 12:06:03.341870       1 flags.go:57] FLAG: --in-recommendation-bounds-eviction-lifetime-threshold="12h0m0s"
I0530 12:06:03.341875       1 flags.go:57] FLAG: --kube-api-burst="10"
I0530 12:06:03.341881       1 flags.go:57] FLAG: --kube-api-qps="5"
I0530 12:06:03.341886       1 flags.go:57] FLAG: --kubeconfig=""
I0530 12:06:03.341891       1 flags.go:57] FLAG: --log-backtrace-at=":0"
I0530 12:06:03.341901       1 flags.go:57] FLAG: --log-dir=""
I0530 12:06:03.341906       1 flags.go:57] FLAG: --log-file=""
I0530 12:06:03.341912       1 flags.go:57] FLAG: --log-file-max-size="1800"
I0530 12:06:03.341918       1 flags.go:57] FLAG: --logtostderr="true"
I0530 12:06:03.341923       1 flags.go:57] FLAG: --min-replicas="2"
I0530 12:06:03.341927       1 flags.go:57] FLAG: --one-output="false"
I0530 12:06:03.341932       1 flags.go:57] FLAG: --pod-update-threshold="0.1"
I0530 12:06:03.341938       1 flags.go:57] FLAG: --skip-headers="false"
I0530 12:06:03.341943       1 flags.go:57] FLAG: --skip-log-headers="false"
I0530 12:06:03.341948       1 flags.go:57] FLAG: --stderrthreshold="2"
I0530 12:06:03.341953       1 flags.go:57] FLAG: --updater-interval="1m0s"
I0530 12:06:03.341958       1 flags.go:57] FLAG: --use-admission-controller-status="true"
I0530 12:06:03.341966       1 flags.go:57] FLAG: --v="2"
I0530 12:06:03.341970       1 flags.go:57] FLAG: --vmodule=""
I0530 12:06:03.341974       1 flags.go:57] FLAG: --vpa-object-namespace=""
I0530 12:06:03.341979       1 main.go:75] Vertical Pod Autoscaler 0.13.0 Updater
I0530 12:06:03.443038       1 fetcher.go:99] Initial sync of Deployment completed
I0530 12:06:03.543184       1 fetcher.go:99] Initial sync of ReplicaSet completed
I0530 12:06:03.643758       1 fetcher.go:99] Initial sync of StatefulSet completed
I0530 12:06:03.746320       1 fetcher.go:99] Initial sync of ReplicationController completed
I0530 12:06:03.846935       1 fetcher.go:99] Initial sync of Job completed
I0530 12:06:03.948270       1 fetcher.go:99] Initial sync of CronJob completed
I0530 12:06:04.049074       1 fetcher.go:99] Initial sync of DaemonSet completed
W0530 12:06:04.049959       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
W0530 12:06:04.049979       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
W0530 12:06:04.050012       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
W0530 12:06:04.050032       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
W0530 12:06:04.050045       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
W0530 12:06:04.050059       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
W0530 12:06:04.050103       1 shared_informer.go:401] The sharedIndexInformer has started, run more than once is not allowed
I0530 12:06:04.150577       1 updater.go:241] Rate limit disabled
I0530 12:06:04.654436       1 api.go:92] Initial VPA synced successfully
I0530 12:08:04.836553       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-flg2f with priority 8.870000000000001
I0530 12:08:04.836605       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-x22ff with priority 8.870000000000001
I0530 12:08:04.836627       1 updater.go:215] evicting pod hamster-67b6b96f6c-flg2f
I0530 12:09:04.660972       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-mztkc with priority 8.870000000000001
I0530 12:09:04.661348       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-x22ff with priority 8.870000000000001
I0530 12:09:04.661526       1 updater.go:215] evicting pod hamster-67b6b96f6c-mztkc
I0530 12:10:04.662163       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-rq5r4 with priority 8.870000000000001
I0530 12:10:04.662530       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-x22ff with priority 8.870000000000001
I0530 12:10:04.662678       1 updater.go:215] evicting pod hamster-67b6b96f6c-rq5r4
I0530 12:11:04.661745       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-x22ff with priority 8.870000000000001
I0530 12:11:04.662446       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-gg7n9 with priority 8.870000000000001
I0530 12:11:04.662637       1 updater.go:215] evicting pod hamster-67b6b96f6c-x22ff
I0530 12:12:04.663302       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-6xw7q with priority 8.870000000000001
I0530 12:12:04.663597       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-gg7n9 with priority 8.870000000000001
I0530 12:12:04.663771       1 updater.go:215] evicting pod hamster-67b6b96f6c-6xw7q
I0530 12:13:04.767245       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-q5gsr with priority 8.870000000000001
I0530 12:13:04.767303       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-gg7n9 with priority 8.870000000000001
I0530 12:13:04.767462       1 updater.go:215] evicting pod hamster-67b6b96f6c-q5gsr
I0530 12:14:04.659934       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-gg7n9 with priority 8.870000000000001
I0530 12:14:04.659967       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-vf8zb with priority 8.870000000000001
I0530 12:14:04.659984       1 updater.go:215] evicting pod hamster-67b6b96f6c-gg7n9
I0530 12:15:04.660461       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-2gq4m with priority 8.870000000000001
I0530 12:15:04.660491       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-vf8zb with priority 8.870000000000001
I0530 12:15:04.660508       1 updater.go:215] evicting pod hamster-67b6b96f6c-2gq4m
I0530 12:16:04.660608       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-vf8zb with priority 8.870000000000001
I0530 12:16:04.660642       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-xv5c2 with priority 8.870000000000001
I0530 12:16:04.660674       1 updater.go:215] evicting pod hamster-67b6b96f6c-vf8zb
I0530 12:17:04.659667       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-xv5c2 with priority 8.870000000000001
I0530 12:17:04.660014       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-8svp8 with priority 8.870000000000001
I0530 12:17:04.660193       1 updater.go:215] evicting pod hamster-67b6b96f6c-xv5c2
I0530 12:18:04.727353       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-bq8tq with priority 8.870000000000001
I0530 12:18:04.727489       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-8svp8 with priority 8.870000000000001
I0530 12:18:04.727580       1 updater.go:215] evicting pod hamster-67b6b96f6c-bq8tq
I0530 12:19:04.660470       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-8svp8 with priority 8.870000000000001
I0530 12:19:04.660501       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-26whb with priority 8.870000000000001
I0530 12:19:04.660562       1 updater.go:215] evicting pod hamster-67b6b96f6c-8svp8
I0530 12:20:04.660324       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-26whb with priority 8.870000000000001
I0530 12:20:04.660638       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-498vs with priority 8.870000000000001
I0530 12:20:04.660813       1 updater.go:215] evicting pod hamster-67b6b96f6c-26whb
I0530 12:21:04.660365       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-n48tp with priority 8.870000000000001
I0530 12:21:04.660756       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-498vs with priority 8.870000000000001
I0530 12:21:04.660935       1 updater.go:215] evicting pod hamster-67b6b96f6c-n48tp
I0530 12:22:04.662010       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-lsmkf with priority 8.870000000000001
I0530 12:22:04.662040       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-498vs with priority 8.870000000000001
I0530 12:22:04.662104       1 updater.go:215] evicting pod hamster-67b6b96f6c-lsmkf
I0530 12:23:04.805797       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-498vs with priority 8.870000000000001
I0530 12:23:04.805829       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-8h594 with priority 8.870000000000001
I0530 12:23:04.805848       1 updater.go:215] evicting pod hamster-67b6b96f6c-498vs
I0530 12:24:04.661588       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-8h594 with priority 8.870000000000001
I0530 12:24:04.661619       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-dqphb with priority 8.870000000000001
I0530 12:24:04.661636       1 updater.go:215] evicting pod hamster-67b6b96f6c-8h594
I0530 12:25:04.659763       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-dqphb with priority 8.870000000000001
I0530 12:25:04.659793       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-kwdng with priority 8.870000000000001
I0530 12:25:04.659809       1 updater.go:215] evicting pod hamster-67b6b96f6c-dqphb
I0530 12:26:04.661787       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-kwdng with priority 8.870000000000001
I0530 12:26:04.661933       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-cw57w with priority 8.870000000000001
I0530 12:26:04.662033       1 updater.go:215] evicting pod hamster-67b6b96f6c-kwdng
I0530 12:27:04.661919       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-cw57w with priority 8.870000000000001
I0530 12:27:04.662305       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-6j6c2 with priority 8.870000000000001
I0530 12:27:04.662479       1 updater.go:215] evicting pod hamster-67b6b96f6c-cw57w
I0530 12:28:04.834828       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-bx8px with priority 8.870000000000001
I0530 12:28:04.834878       1 update_priority_calculator.go:143] pod accepted for update kube-system/hamster-67b6b96f6c-6j6c2 with priority 8.870000000000001
I0530 12:28:04.834902       1 updater.go:215] evicting pod hamster-67b6b96f6c-bx8px

And the test pod shows the event

EvictedByVPA: Pod was evicted by VPA Updater to apply resource recommendation.

So, the recommendation is made but the pod is never successfully replaced and the admission controller never shows any successful messages/logs.

brsolomon-deloitte commented 1 year ago
$ kubectl get svc -n kube-system vpa-webhook -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"admission-controller","app.kubernetes.io/instance":"vertical-pod-autoscaler","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"vertical-pod-autoscaler","app.kubernetes.io/version":"0.13.0","argocd.argoproj.io/instance":"vertical-pod-autoscaler","helm.sh/chart":"vertical-pod-autoscaler-7.0.1"},"name":"vpa-webhook","namespace":"kube-system"},"spec":{"ports":[{"name":"https","port":443,"protocol":"TCP","targetPort":"https"}],"selector":{"app.kubernetes.io/component":"admission-controller","app.kubernetes.io/instance":"vertical-pod-autoscaler","app.kubernetes.io/name":"vertical-pod-autoscaler"},"type":"ClusterIP"}}
  creationTimestamp: "2023-05-30T12:05:58Z"
  labels:
    app.kubernetes.io/component: admission-controller
    app.kubernetes.io/instance: vertical-pod-autoscaler
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: vertical-pod-autoscaler
    app.kubernetes.io/version: 0.13.0
    argocd.argoproj.io/instance: vertical-pod-autoscaler
    helm.sh/chart: vertical-pod-autoscaler-7.0.1
  name: vpa-webhook
  namespace: kube-system
  resourceVersion: "13226956"
  uid: 09f58a37-152e-4ea9-a23c-150ce8c5643c
spec:
  clusterIP: 10.100.50.46
  clusterIPs:
  - 10.100.50.46
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    app.kubernetes.io/component: admission-controller
    app.kubernetes.io/instance: vertical-pod-autoscaler
    app.kubernetes.io/name: vertical-pod-autoscaler
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
brsolomon-deloitte commented 1 year ago

Alright, I have found the source of the issue.

We had an allow rule from kube-apiserver to the service port of the vpa-webhook service but not to the underlying endpoint port.

$ kubectl describe -n kube-system service vpa-webhook
Name:              vpa-webhook
Namespace:         kube-system
Labels:            app.kubernetes.io/component=admission-controller
                   app.kubernetes.io/instance=vertical-pod-autoscaler
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=vertical-pod-autoscaler
                   app.kubernetes.io/version=0.13.0
                   argocd.argoproj.io/instance=vertical-pod-autoscaler
                   helm.sh/chart=vertical-pod-autoscaler-7.0.1
Annotations:       <none>
Selector:          app.kubernetes.io/component=admission-controller,app.kubernetes.io/instance=vertical-pod-autoscaler,app.kubernetes.io/name=vertical-pod-autoscaler
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.100.50.46
IPs:               10.100.50.46
Port:              https  443/TCP
TargetPort:        https/TCP
Endpoints:         172.xx.xx.xx:8000
Session Affinity:  None
Events:            <none>

Once I allowed TCP/8000 from control plane to worker nodes, the admission controller pods started emitting new logs immediately and the new test (hamster) pod was replaced with increased resources.

I0530 12:45:19.913518       1 server.go:112] Sending patches: [{add /spec/updatePolicy {0xc000575800 <nil>}}]
I0530 12:45:20.210128       1 handler.go:79] Admitting pod {hamster-67b6b96f6c-% hamster-67b6b96f6c- kube-system    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app:hamster pod-template-hash:67b6b96f6c] map[] [{apps/v1 ReplicaSet hamster-67b6b96f6c 2ef0c853-6363-4d52-8614-e63b7fc38eb0 0xc000712bd7 0xc000712bd8}] [] [{kube-controller-manager Update v1 2023-05-30 12:45:04 +0000 UTC FieldsV1 {"f:metadata":{"f:generateName":{},"f:labels":{".":{},"f:app":{},"f:pod-template-hash":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"2ef0c853-6363-4d52-8614-e63b7fc38eb0\"}":{}}},"f:spec":{"f:containers":{"k:{\"name\":\"hamster\"}":{".":{},"f:args":{},"f:command":{},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{".":{},"f:requests":{".":{},"f:cpu":{},"f:memory":{}}},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{".":{},"f:runAsNonRoot":{},"f:runAsUser":{}},"f:terminationGracePeriodSeconds":{}}} }]}
I0530 12:45:20.210516       1 matcher.go:68] Let's choose from 1 configs for pod kube-system/hamster-67b6b96f6c-%
I0530 12:45:20.210613       1 recommendation_provider.go:90] updating requirements for pod hamster-67b6b96f6c-%.
I0530 12:45:20.210848       1 server.go:112] Sending patches: [{add /metadata/annotations map[]} {add /spec/containers/0/resources/requests/cpu 587m} {add /spec/containers/0/resources/requests/memory 262144k} {add /metadata/annotations/vpaUpdates Pod resources updated by hamster-vpa: container 0: cpu request, memory request} {add /metadata/annotations/vpaObservedContainers hamster}]
I0530 12:45:56.934897       1 handler.go:79] Admitting pod {kube-proxy-% kube-proxy- kube-system    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[controller-revision-hash:57c4cb8ffd k8s-app:kube-proxy pod-template-generation:2] map[] [{apps/v1 DaemonSet kube-proxy 2d2a80f8-b729-4197-ac0a-a09b5daa7770 0xc000633eb9 0xc000633eba}] [] [{kube-controller-manager Update v1 2023-05-30 12:45:56 +0000 UTC FieldsV1 {"f:metadata":{"f:generateName":{},"f:labels":{".":{},"f:controller-revision-hash":{},"f:k8s-app":{},"f:pod-template-generation":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"2d2a80f8-b729-4197-ac0a-a09b5daa7770\"}":{}}},"f:spec":{"f:affinity":{".":{},"f:nodeAffinity":{".":{},"f:requiredDuringSchedulingIgnoredDuringExecution":{}}},"f:containers":{"k:{\"name\":\"kube-proxy\"}":{".":{},"f:command":{},"f:env":{".":{},"k:{\"name\":\"NODE_NAME\"}":{".":{},"f:name":{},"f:valueFrom":{".":{},"f:fieldRef":{}}}},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{".":{},"f:requests":{".":{},"f:cpu":{}}},"f:securityContext":{".":{},"f:privileged":{}},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{},"f:volumeMounts":{".":{},"k:{\"mountPath\":\"/lib/modules\"}":{".":{},"f:mountPath":{},"f:name":{},"f:readOnly":{}},"k:{\"mountPath\":\"/run/xtables.lock\"}":{".":{},"f:mountPath":{},"f:name":{}},"k:{\"mountPath\":\"/var/lib/kube-proxy-config/\"}":{".":{},"f:mountPath":{},"f:name":{}},"k:{\"mountPath\":\"/var/lib/kube-proxy/\"}":{".":{},"f:mountPath":{},"f:name":{}},"k:{\"mountPath\":\"/var/log\"}":{".":{},"f:mountPath":{},"f:name":{}}}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:hostNetwork":{},"f:priorityClassName":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:serviceAccount":{},"f:serviceAccountName":{},"f:terminationGracePeriodSeconds":{},"f:tolerations":{},"f:volumes":{".":{},"k:{\"name\":\"config\"}":{".":{},"f:configMap":{".":{},"f:defaultMode":{},"f:name":{}},"f:name":{}},"k:{\"name\":\"kubeconfig\"}":{".":{},"f:configMap":{".":{},"f:defaultMode":{},"f:name":{}},"f:name":{}},"k:{\"name\":\"lib-modules\"}":{".":{},"f:hostPath":{".":{},"f:path":{},"f:type":{}},"f:name":{}},"k:{\"name\":\"varlog\"}":{".":{},"f:hostPath":{".":{},"f:path":{},"f:type":{}},"f:name":{}},"k:{\"name\":\"xtables-lock\"}":{".":{},"f:hostPath":{".":{},"f:path":{},"f:type":{}},"f:name":{}}}}} }]}

Admittedly I am a little confused by this. My understanding is that the control plane / kube-apiserver would attempt to talk to the VPA admission webhook at its service (e.g. vpa-webhook.kube-system.svc.cluster.local:443) rather than the underlying endpoint of 172.xx.xx.xx:8000.

What led me to this discovery was VPA restarts my pods but does not modify CPU or memory settings. That describes my situation here, that the recommender was firing but the admission controller was silent.

jbartosik commented 1 year ago

Sounds like this is resolved so I'm going to close the issue. Please reopen if you think there's something more we should do here.

voelzmo commented 1 year ago

/remove-kind bug

Kanishkpsr007 commented 8 months ago

@brsolomon-deloitte I'm also facing the same issue but only in EKS version 1.25, I have deployed VPA in 1.23 and 1.24 in those it's working fine. In 1.25 the controller is not updating the memory or CPU settings but the recommender is giving the recommendation and the updater is evicting the pod. So may I know how did you allowed TCP/8000 from the control plane to worker nodes.

goosetapa commented 2 months ago

@brsolomon-deloitte I'm also facing the same issue but only in EKS version 1.25, I have deployed VPA in 1.23 and 1.24 in those it's working fine. In 1.25 the controller is not updating the memory or CPU settings but the recommender is giving the recommendation and the updater is evicting the pod. So may I know how did you allowed TCP/8000 from the control plane to worker nodes.

I think: by adding an inbound rule (TCP/8000) on your eks node security group you can allow the control plane to access the vpa admission webhook in your eks worker nodes.