kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
7.8k stars 3.87k forks source link

"vpa-admission-controller" getting into CrashLoopBackOff in result to PR: #6665 #6977

Closed Ramneek-kalra closed 2 days ago

Ramneek-kalra commented 5 days ago

Which component are you using?: vertical-pod-autoscaler

What version of the component are you using?: registry.k8s.io/autoscaling/vpa-admission-controller:1.1.2

Component version: registry.k8s.io/autoscaling/vpa-admission-controller:1.1.2 (installed via steps shared at: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler#install-command

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
Client Version: v1.30.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.5-eks-1de2ab1

What environment is this in?: Amazon EKS Version 1.29

What did you expect to happen?: Expected to have "vpa-admission-controller" in RUNNING State.

What happened instead?: "vpa-admission-controller" went into CrashLoopBackOff with below logs:

➜ vertical-pod-autoscaler git:(master) kubectl logs -n kube-system vpa-admission-controller-7cb49b77d6-v557v
unknown flag: --reload-cert
unknown flag: --reload-cert
Usage of /admission-controller:
--add-dir-header                   If true, adds the file directory to the header of the log messages
--address string                   The address to expose Prometheus metrics. (default ":8944")
--alsologtostderr                  log to standard error as well as files (no effect when -logtostderr=true)
--client-ca-file string            Path to CA PEM file. (default "/etc/tls-certs/caCert.pem")
--kube-api-burst float             QPS burst limit when making requests to Kubernetes apiserver (default 10)
--kube-api-qps float               QPS limit when making requests to Kubernetes apiserver (default 5)
--kubeconfig string                Path to a kubeconfig. Only required if out-of-cluster.
--log-backtrace-at traceLocation   when logging hits line file:N, emit a stack trace (default :0)
--log-dir string                   If non-empty, write log files in this directory (no effect when -logtostderr=true)
--log-file string                  If non-empty, use this log file (no effect when -logtostderr=true)
--log-file-max-size uint           Defines the maximum size a log file can grow to (no effect when -logtostderr=true). Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
--logtostderr                      log to standard error instead of files (default true)
--one-output                       If true, only write logs to their native severity level (vs also writing to each lower severity level; no effect when -logtostderr=true)
--port int                         The port to listen on. (default 8000)
--register-by-url                  If set to true, admission webhook will be registered by URL (webhookAddress:webhookPort) instead of by service name
--register-webhook                 If set to true, admission webhook object will be created on start up to register with the API server. (default true)
--skip-headers                     If true, avoid header prefixes in the log messages
--skip-log-headers                 If true, avoid headers when opening log files (no effect when -logtostderr=true)
--stderrthreshold severity         logs at or above this threshold go to stderr when writing to files and stderr (no effect when -logtostderr=true or -alsologtostderr=false)
--tls-cert-file string             Path to server certificate PEM file. (default "/etc/tls-certs/serverCert.pem")
--tls-private-key string           Path to server certificate key PEM file. (default "/etc/tls-certs/serverKey.pem")
-v, --v Level                          number for the log level verbosity (default 0)
--vmodule moduleSpec               comma-separated list of pattern=N settings for file-filtered logging
--vpa-object-namespace string      Namespace to search for VPA objects. Empty means all namespaces will be used.
--webhook-address string           Address under which webhook is registered. Used when registerByURL is set to true.
--webhook-port string              Server Port for Webhook
--webhook-service string           Kubernetes service under which webhook is registered. Used when registerByURL is set to false. (default "vpa-webhook")
--webhook-timeout-seconds int      Timeout in seconds that the API server should wait for this webhook to respond before failing. (default 30)

How to reproduce it (as minimally and precisely as possible): Just follow installation shared at: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler#install-command or https://docs.aws.amazon.com/eks/latest/userguide/vertical-pod-autoscaler.html

Anything else we need to know?: Found that this feature was added yesterday itself via PR: https://github.com/kubernetes/autoscaler/pull/6665 which is breaking things.

As a workaround, I did remove the flag "--reload-cert" from: https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/deploy/admission-controller-deployment.yaml which sorted out things, however not a permanent fix if certificate-reload issue comes.

Ramneek-kalra commented 5 days ago

Related PR: https://github.com/kubernetes/autoscaler/pull/6665 which caused this breaking.

voelzmo commented 5 days ago

Hey @Ramneek-kalra thanks for your issue! you're mentioning that you're using registry.k8s.io/autoscaling/vpa-admission-controller:1.1.2 – this version does not contain the PR you mentioned. This was just merged yesterday and was not released in any image published on the k8s registry.

So the error that you're seeing means: you're trying to use a feature flag which has been merged to the code, but is not published in a new VPA version yet. You will have to wait for VPA 1.2.0 to be released so you can use the feature.

Hope that helps to clear things up!

/close /remove-kind bug /kind support

k8s-ci-robot commented 5 days ago

@voelzmo: Closing this issue.

In response to [this](https://github.com/kubernetes/autoscaler/issues/6977#issuecomment-2191252199): >Hey @Ramneek-kalra thanks for your issue! you're mentioning that you're using `registry.k8s.io/autoscaling/vpa-admission-controller:1.1.2` – this version does not contain the PR you mentioned. This was just merged yesterday and was not released in any image published on the k8s registry. > >So the error that you're seeing means: you're trying to use a feature flag which has been merged to the code, but is not published in a new VPA version yet. You will have to wait for VPA 1.2.0 to be released so you can use the feature. > >Hope that helps to clear things up! > >/close >/remove-kind bug >/kind support Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
Ramneek-kalra commented 5 days ago

Thanks @voelzmo for your update! This helps. However, please be sure that I didn't install anything additional other than what we have on installation steps. So a review is required.

Do we have any ETA for that release?

voelzmo commented 5 days ago

Hey @Ramneek-kalra I'm not sure what kind of review you're suggesting, can you help me understand this a bit better?

Regarding the ETA for a new VPA release: this recently came up in a different thread as well, we don't have a fixed timeline for a release before the next k8s version, but hopefully can cut a release in the next few weeks – no promises, though as this depends on the availability of individual people: https://github.com/kubernetes/autoscaler/pull/6625#issuecomment-2183114512

Ramneek-kalra commented 5 days ago

Hi @voelzmo , Sorry for not so elaborative.

I am asking you to review the installation steps - https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler#install-command as this landed me to use that feature-flag automatically and use-merged code which shouldn't be the case as many customers might face this issue then.

And Thanks for ETA Update.

danieljkemp commented 3 days ago

Having never installed VPA on this cluster, I just ran face-first into this following the install documentation.

voelzmo commented 2 days ago

/reopen

Sorry, I didn't understand the part about the installation instructions, but now I see that the admission-controller deployment is specifying image version 1.1.2 and at the same time has this parameter configured. Thanks for being persistent about this!

k8s-ci-robot commented 2 days ago

@voelzmo: Reopened this issue.

In response to [this](https://github.com/kubernetes/autoscaler/issues/6977#issuecomment-2197603032): >/reopen > >Sorry, I didn't understand the part about the installation instructions, but now I see that the admission-controller deployment is specifying image version 1.1.2 and at the same time has this parameter configured. Thanks for being persistent about this! Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
voelzmo commented 2 days ago

/remove-kind support /kind bug