k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
26.98k stars 2.27k forks source link

Not able to connect to kubelet when using bind-address flag #10444

Open manuelbuil opened 2 days ago

manuelbuil commented 2 days ago

Environmental Info: K3s Version:

Any

Node(s) CPU architecture, OS, and Version:

Cluster Configuration:

1 server, 1 agent

Describe the bug:

When deploying k3s using bind-address flag in the nodes, kubelet starts listening to $bind-address:$port, where normally port=10250. When executing kubectl exec ... or kubectl logs ..., kube-api contacts the supervisor and the HTTP CONNECT request is forwarded to the agent's kubelet via the websocket tunnel.

The problem is that the supervisor is always using the loopback address, which means it is expecting kubelet process to listen to 127.0.0.1:$port (or [::1]:$port). However, when bind-address flag is used, that's not the case anymore. As a consequence, we get a connection refused error.

Kubectl output:

Error from server: Get "https://10.10.10.100:10250/containerLogs/kube-system/cilium-b4v6m/cilium-agent": proxy error from 10.10.10.100:9345 while dialing 10.10.10.100:10250, code 502: 502 Bad Gateway

journalctl output:

Jul 02 09:08:57 server-0 rke2[15069]: time="2024-07-02T09:08:57Z" level=error msg="Sending HTTP 502 response to 10.10.10.100:35608: dial tcp 127.0.0.1:10250: connect: connection refused"

Steps To Reproduce:

1 - Deploy k3s with bind-address flag 2 - Check sudo ss -lpn 'sport = :10250' and see that 10250 port is bound to a specific IP address 3 - Deploy a pod and try to kubectl logs ... or kubectl exec .... You'll get the above error

Expected behavior:

Actual behavior:

Additional context / logs:

brandond commented 2 days ago

If we can get https://github.com/rancher/remotedialer/pull/80 merged, we could shim in a custom Dialer that grabs the remotedialer request to dial localhost:10250 (or whatever the kubelet port is), and replaces it with the configured bind address. The server knows the kubelet port from checking the Node object, but there is unfortunately no way for the server to know what address the kubelet's listener is bound to so we always dial it via loopback. The agent DOES know and can make sure it ends up connected to the right place.

manuelbuil commented 2 days ago

Thanks Brad! I tested it this morning and kubectl exec ... and kubectl logs... works with a harmless "Error on socket receive" which we could investigate further. There is one thing though, the connection kube-api ==> metrics-server fails:

k3s[15348]: E0703 10:39:49.017827   15348 available_controller.go:460] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.42.1.2:10250/apis/metrics.k8s.io/v1beta1: bad status from https://10.42.1.2:10250/apis/metrics.k8s.io/v1beta1: 404

Probably it is using the supervisor via the egressSelector too and hence the localDialer address I set instead of 10.42.1.2:10250. Tricky

brandond commented 1 day ago

yes, you'd need to be sure to redirect the dialer connection only if the connection is to localhost:kubeletport; everything else should be allowed through as-is