kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8.08k stars 3.97k forks source link

VPA: Passing args to vpa-updater #7291

Closed NEO2756 closed 5 days ago

NEO2756 commented 1 month ago

The moment I pass any arg to updater pod ex: as described here , I stop getting any logs for updater after the logs as shown below. What am I doing wrong ?

I0917 08:33:45.292806       1 fetcher.go:99] Initial sync of DaemonSet completed
I0917 08:33:45.493064       1 fetcher.go:99] Initial sync of Deployment completed
I0917 08:33:45.593691       1 fetcher.go:99] Initial sync of ReplicaSet completed
I0917 08:33:45.694550       1 fetcher.go:99] Initial sync of StatefulSet completed
I0917 08:33:45.794948       1 fetcher.go:99] Initial sync of ReplicationController completed
I0917 08:33:45.895580       1 fetcher.go:99] Initial sync of Job completed
I0917 08:33:45.996170       1 fetcher.go:99] Initial sync of CronJob completed
I0917 08:33:45.996292       1 controller_fetcher.go:141] Initial sync of Deployment completed
I0917 08:33:45.996313       1 controller_fetcher.go:141] Initial sync of ReplicaSet completed
I0917 08:33:45.996318       1 controller_fetcher.go:141] Initial sync of StatefulSet completed
I0917 08:33:45.996322       1 controller_fetcher.go:141] Initial sync of ReplicationController completed
I0917 08:33:45.996326       1 controller_fetcher.go:141] Initial sync of Job completed
I0917 08:33:45.996329       1 controller_fetcher.go:141] Initial sync of CronJob completed
I0917 08:33:45.996333       1 controller_fetcher.go:141] Initial sync of DaemonSet completed
W0917 08:33:45.996380       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W0917 08:33:45.996377       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W0917 08:33:45.996397       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W0917 08:33:45.996407       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W0917 08:33:45.996412       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W0917 08:33:45.996416       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W0917 08:33:45.996379       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
I0917 08:33:46.598356       1 api.go:94] Initial VPA synced successfully

Note: I manually changes the min-replica default value in source and built the docker image. It worked fine.

adrianmoisey commented 1 month ago

/area vertical-pod-autoscaler

adrianmoisey commented 1 month ago

The moment I pass any arg to updater pod ex: as described here , I stop getting any logs for updater after the logs as shown below. What am I doing wrong ?

Can you describe what you except to happen? Can you also show examples of the VPAs configured on your cluster.

/label triage/needs-information

k8s-ci-robot commented 1 month ago

@adrianmoisey: The label(s) /label triage/needs-information cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to [this](https://github.com/kubernetes/autoscaler/issues/7291#issuecomment-2355058737): >> The moment I pass any arg to updater pod ex: as described [here](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/FAQ.md#i-get-recommendations-for-my-single-pod-replicaset-but-they-are-not-applied) , I stop getting any logs for updater after the logs as shown below. What am I doing wrong ? > >Can you describe what you except to happen? >Can you also show examples of the VPAs configured on your cluster. > >/label triage/needs-information Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
adrianmoisey commented 1 month ago

/label needs-information

k8s-ci-robot commented 1 month ago

@adrianmoisey: The label(s) /label needs-information cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to [this](https://github.com/kubernetes/autoscaler/issues/7291#issuecomment-2355060176): >/label needs-information Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
adrianmoisey commented 1 month ago

/triage needs-information

NEO2756 commented 1 month ago

Can you describe what you except to happen?

Atleast I should see I0917 10:40:03.920376 1 main.go:95] Vertical Pod Autoscaler 1.2.1 Updater this log ? Also no logs like

1 reflector.go:325] Listing and watching *v1.Pod from k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/logic/updater.go:302
I0917 10:41:05.333983       1 update_priority_calculator.go:109] Container with ContainerScalingModeOff. Skipping container istio-proxy quick OOM calculations
I0917 10:41:05.334009       1 update_priority_calculator.go:132] not updating a short-lived pod dex-base-bwg2m8bn/dex-base-dbus-wxm-client-7f6c9697c9-6bnn9, request within recommended range
I0917 10:41:05.334041       1 update_priority_calculator.go:132] not updating a short-lived pod dex-base-bwg2m8bn/hamster-597687d5bc-tzv5b, request within recommended range

vpa.yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: dbus-wxm-client-vpa
  namespace: dex-base-bwg2m8bn
spec:
  targetRef: 
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       dex-base-dbus-wxm-client
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: dbus-wxm-client
        minAllowed:
          cpu: 300m
          memory: 300Mi
        maxAllowed:
          cpu: 2
          memory: 3Gi
        controlledResources: ["cpu", "memory"]
      - containerName:  istio-proxy
        mode: "Off"
adrianmoisey commented 1 month ago

Atleast I should see I0917 10:40:03.920376 1 main.go:95] Vertical Pod Autoscaler 1.2.1 Updater this log ?

This log happens before the logs that you've pasted. Is your log stream being truncated?

You've provided very little detail here. Please provide details steps on how to reproduce this issue.

NEO2756 commented 1 month ago

I changed the default param to --min-replica to 1

diff --git a/vertical-pod-autoscaler/pkg/updater/main.go b/vertical-pod-autoscaler/pkg/updater/main.go
index 3a72faad8..126aad47f 100644
--- a/vertical-pod-autoscaler/pkg/updater/main.go
+++ b/vertical-pod-autoscaler/pkg/updater/main.go
@@ -53,7 +53,7 @@ var (
        updaterInterval = flag.Duration("updater-interval", 1*time.Minute,
                `How often updater should run`)

-       minReplicas = flag.Int("min-replicas", 2,
+       minReplicas = flag.Int("min-replicas", 1,
                `Minimum number of replicas to perform update`)

        evictionToleranceFraction = flag.Float64("eviction-tolerance", 0.5,

Build the docker image and updated the deployment to take effect.

    spec:
      containers:
        - name: vpa
          image: >-
            <redacted>/sandeep.sharma/<redacted>/vpa-updater-amd64:dev
          ports:
            - name: metrics
              containerPort: 8943
              protocol: TCP

And the updater gives me the expected logs around what is happening:-

I0923 05:56:00.132252       1 update_priority_calculator.go:132] not updating a short-lived pod dex-base-nxnm6tls/dex-base-dbus-wxm-client-6b698bd7bc-dpwt7, request within recommended range
I0923 05:56:04.871517       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/target/fetcher.go:94: Watch close - *v1.Job total 7 items received
I0923 05:56:58.543047       1 reflector.go:790] k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/limitrange/limit_range_calculator.go:60: Watch close - *v1.LimitRange total 9 items received
I0923 05:57:00.132078       1 priority_processor.go:56] Not listed in vpaObservedContainers:hamster. Skipping container istio-proxy priority calculations
I0923 05:57:00.132121       1 update_priority_calculator.go:101] Not listed in vpaObservedContainers:hamster. Skipping container istio-proxy quick OOM calculations
I0923 05:57:00.132135       1 update_priority_calculator.go:132] not updating a short-lived pod default/hamster-c6967774f-cmxjp, request within recommended range
I0923 05:57:00.132173       1 priority_processor.go:56] Not listed in vpaObservedContainers:hamster. Skipping container istio-proxy priority calculations
I0923 05:57:00.132184       1 update_priority_calculator.go:101] Not listed in vpaObservedContainers:hamster. Skipping container istio-proxy quick OOM calculations
I0923 05:57:00.132195       1 update_priority_calculator.go:132] not updating a short-lived pod default/hamster-c6967774f-qksgz, request within recommended range
I0923 05:57:00.132230       1 priority_processor.go:56] Not listed in vpaObservedContainers:dbus-wxm-client. Skipping container istio-proxy priority calculations
I0923 05:57:00.132262       1 update_priority_calculator.go:101] Not listed in vpaObservedContainers:dbus-wxm-client. Skipping container istio-proxy quick OOM calculations
I0923 05:57:00.132275       1 update_priority_calculator.go:132] not updating a short-lived pod dex-base-nxnm6tls/dex-base-dbus-wxm-client-6b698bd7bc-dpwt7, request within recommended range

if I pass the args to the default image. I got no repetitive logs as shown above. The logs i ma getting are as shown in comment

Here is how I am passing the arg.

          image: registry.k8s.io/autoscaling/vpa-updater:1.2.1
          args:
            - '--min-replicas=1'
          ports:

I think arg is getting passed coz updater is not screaming about that it needs 2 (default) replicas. But why no other logs are coming ? Also, the same behaviour for admission-controller, so I think either i am expecting something wrong or I am making some mistake.

Things are working fine for me BTW, as I can see request/limit updated for targetRef. Pls let me know if u need further details.

adrianmoisey commented 1 month ago

Right, Ok. I see what you're saying. The behaviour of the logging changes when a parameter is passed in. Right. I think I know what may be causing this. I'll take a look

/assign /triage accepted

adrianmoisey commented 1 month ago

/assign omerap12

k8s-ci-robot commented 1 month ago

@adrianmoisey: GitHub didn't allow me to assign the following users: omerap12.

Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide

In response to [this](https://github.com/kubernetes/autoscaler/issues/7291#issuecomment-2367902865): >/assign omerap12 > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
omerap12 commented 1 month ago

/assign

adrianmoisey commented 1 month ago

@omerap12 and I looked into this, and found the following: https://github.com/kubernetes/autoscaler/blob/19fe7aba7ec4007084ccea82221b8a52bac42b34/vertical-pod-autoscaler/pkg/updater/Dockerfile#L35

It seems that the Dockerfile sets some defaults, and when an arg is passed to it from Kubernetes, those defaults are overridden.

A workaround for now is to set those two args in the Kubernetes manifest as args.

adrianmoisey commented 1 week ago

/remove-label needs-information

k8s-ci-robot commented 1 week ago

@adrianmoisey: The label(s) /remove-label needs-information cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor, ci-short, ci-extended, ci-full. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to [this](https://github.com/kubernetes/autoscaler/issues/7291#issuecomment-2444884592): >/remove-label needs-information Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
adrianmoisey commented 1 week ago

/remove-triage needs-information

adrianmoisey commented 5 days ago

The default options have been moved out of the Docker image and into code, so this should be fixed in the next release of the VPA