kubernetes-sigs / metrics-server

Scalable and efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines.
https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/
Apache License 2.0
5.81k stars 1.87k forks source link

helm chart: default for apiService.insecureSkipTLSVerify not respected #1221

Open evgenyfadeev opened 1 year ago

evgenyfadeev commented 1 year ago

What happened:

I ran:

helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/ helm upgrade --install metrics-server metrics-server/metrics-server

what I saw:

1) kubectl top nodes did not work, logs were suggesting that the ssl certificates could not be verified due to lack of IP SANS - from which I gather that TLS verification was enabled.

2) the helm chart documentation tells that the default for apiService.insecureSkipTLSVerify = true, which I verified by running helm get values metrics-server -n kube-system --all

which showed that TLS verification is expected to be disabled:

apiService:
  annotations: {}
  caBundle: ""
  create: true
  insecureSkipTLSVerify: true

3) kubectl describe deployment did not show flag --kubelet-insecure-tls, suggesting that TLS verification is enabled.

4) I was able to work around this with the following command:

helm upgrade metrics-server metrics-server/metrics-server --set args="{--kubelet-insecure-tls}" -n kube-system

5) After the above change kubectl top started working as expected.

Looks like the default value for apiService.insecureSkipTLSVerify is not doing anything.

What you expected to happen:

I expected kubectl top to work, because according to the table of default values at https://artifacthub.io/packages/helm/metrics-server/metrics-server,

apiService.insecureSkipTLSVerify = true

I'm assuming tha this means that TLS verification would be skipped.

Anything else we need to know?:

Environment:

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:59:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.6", GitCommit:"ff2c119726cc1f8926fb0585c74b25921e866a28", GitTreeState:"clean", BuildDate:"2023-01-18T19:15:26Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"} WARNING: version difference between client (1.21) and server (1.25) exceeds the supported minor version skew of +/-1

spoiler for Metrics Server manifest: Official Helm Chart from artifact hub.
spoiler for Kubelet config:
spoiler for Metrics Server logs: "Failed to scrape node" err="Get \"https://IPADDR:10250/metrics/resource\": x509: cannot validate certificate for IPADDR because it doesn't contain any IP SANs" node="...."

/kind bug

Trevor-K-Smith commented 1 year ago

I am also experiencing this behavior. Workaround fixed the issue.

logicalhan commented 1 year ago

/triage accepted /assign @stevehipwell

stevehipwell commented 1 year ago

@evgenyfadeev I think you might be confusing the APIService TLS configuration and the --kubelet-insecure-tls arg which AFAIK are for two different purposes. The flag is described in the README and the APIService TLS configuration is in #544.

evgenyfadeev commented 1 year ago

@stevehipwell maybe I am indeed missing something, but afaik by default the HELM-installed metrics server should not use tls (as stated in the documentation), but when I deploy it, it does use TLS and since the SSL certificates are not installed properly, the metrics server is not working. So I found a workaround to pass the argument directly to the metrics server, disabling the tls.

Basically, the helm chart does not work out of the box, without either disabling the tls, or configuring the ssl certs.

stevehipwell commented 1 year ago

@evgenyfadeev my understanding of both the docs and implementation is that by default Metrics Server uses TLS when communicating with the kubelets (this can be disabled by the --kubelet-insecure-tls flag which is recommended only for testing purposes) but without TLS in front of the APIService (controlled by apiService.insecureSkipTLSVerify).

@serathius could you correct me if I've misunderstood?

evgenyfadeev commented 1 year ago

@stevehipwell right I understand the recommendation about use of tls. Isn't apiService.insecureSkipTLSVerify = true supposed to be equivalent of setting --kubelet-insecure-tls arg? If not - then I must be missing something, but in that case I find it tricky to understand the relationship apiService.insecureSkipTLSVerify with tls verification enabled or not. Regardless - I've tried the helm chart with all the the defaults and it did not work for me.

stevehipwell commented 1 year ago

@evgenyfadeev the arg is for Metrics Server scraping kubelet and the chart setting is for the API passthrough to Metrics Server, these two things are completely unrelated.

If the chart isn't working for you then it's most likely an issue with how you've configured your cluster.

evgenyfadeev commented 1 year ago

@stevehipwell thank you. Does that mean that there is some pre-requisite cluster configuration for the chart to work by default? For example - do I need to install the tls certificates with SANs? Is is described somewhere in the documentation?

stevehipwell commented 1 year ago

@evgenyfadeev I think you'll need to look closer at the Metrics Server docs, my main focus here is the Helm chart. I know that you need to flag when running on a Kind cluster but running on EKS just works fine.

evgenyfadeev commented 1 year ago

Regrding the meaning of apiService.insecureSkipTLSVerify and the --kubelet-insecure-tls flag - one has to read carefully to see that these don't mean the same - it probably deserves a better explanation in the docs. I'm using LKE (Linode), I'll ask them.

stevehipwell commented 1 year ago

Regrding the meaning of apiService.insecureSkipTLSVerify and the --kubelet-insecure-tls flag - one has to read carefully to see that these don't mean the same - it probably deserves a better explanation in the docs. I'm using LKE (Linode), I'll ask them.

We're always happy to take PRs.

evgenyfadeev commented 1 year ago

I'm looking at the doc page https://github.com/kubernetes-sigs/metrics-server/blob/master/docs/command-line-flags.txt

--kubelet-insecure-tls Do not verify CA of serving certificates presented by Kubelets. For testing purposes only.

And for the chart https://artifacthub.io/packages/helm/metrics-server/metrics-server:

apiService.insecureSkipTLSVerify --- Specifies whether to skip TLS verification -- default is true

The first says "do not verify" and the second "skip verification" -- even on the second reading I'm not quite catching the difference in the meaning.

I bet that most people will assume that insecureSkipTLSVerify is expected to propagate to the setting the --kubelet-insecure-tls flag.


I guess the "do not verify the CA" and skip TLS verification are two different things, yet it is easy to miss.

serathius commented 1 year ago

What in a names --kubelet-insecure-tls, apiService.insecureSkipTLSVerify make you think they are related? One is --KUBELET--insecure-tls second is apiService.insecureSkipTLSVerify. One refers to kubelet, second to apiservice.

The first says "do not verify" and the second "skip verification" -- even on the second reading I'm not quite catching the difference in the meaning.

Unfortunately this is totally wrong interpretation. They affect separate types of connections done by metrics server. One from metrics server to kubelet, second apiserver to metrics server. Also --kubelet-insecure-tls means no TLS, second insecureSkipTLSVerify means do TLS, but don't verify certificate.

evgenyfadeev commented 1 year ago

@serathius

Unfortunately this is totally wrong interpretation. They affect separate types of connections done by metrics server.

And yet - people fall for it, note that there are two "me too" votes on my post. That's why I suggest that perhaps adding a short note on the apiService.insecureSkipTLSVerify entry in the docs is a good idea.

What in a names --kubelet-insecure-tls, apiService.insecureSkipTLSVerify make you think they are related?

Both have words "insecure" and "tls", and because the descriptions for the two terms are very similarly worded. Finally - the error message points to the issue that certificate does not have the SAN values and one might assume that skipping tls certificate verification would eliminate the error.

Another reason why one might assume that apiService.insecureSkipTLSVerify == -kubelet-insecure-tls is because metrics server has the flag and it's reasonable to expect that there is an equivalent value for it in the helm chart and the insecureSkipTLSVerify is the only value that sounds any similar.

I understand that this is an incorrect interpretation, but it is quite easy to make it especially if you don't have much time for parsing the documentation.

serathius commented 1 year ago

Don't understand the argument, because it breaks basic assumptions on importance of order in naming. It's not tls-insecure-kubelet but kubelet-insecure-tls.

Still as long as there are people confused, documentation can be improved.

Contributions are welcomed.

petertiedemann commented 12 months ago

Just wanted to add a "me too" here. Hit the exact same thing.

MurzNN commented 11 months ago

I added a PR https://github.com/kubernetes-sigs/metrics-server/pull/1382 that adds a tip on how to resolve the error to the Readme, it should make the solution more visible.