kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.41k stars 1.55k forks source link

`kind export logs` doesn't work with podman remote client. #3742

Open wbrefvem opened 1 month ago

wbrefvem commented 1 month ago

Tagential to #3729

kind export logs results in ssh handshake errors when running under the podman remote client (i.e. podman on Windows and macOS). The easiest way to reproduce is to create a 4-node cluster with the podman provider enabled on either Windows or macOS and then run kind export logs.

On Windows (incl. WSL) and macOS, the podman client (podman-remote) works by sending an API request to a podman VM (or container on WSL) through an ssh tunnel. I noticed that the set of podman commands that would return errors varied between runs of kind export logs, and upon digging into it further, my working hypothesis is that because commands to collect logs are all run concurrently, it's possible to run up against the max ssh connections to the podman machine. (Specifically MaxSessions and MaxStartups in the sshd config, each with a default of 10.) I set the ssh config to allow 30 max connections for a cluster of 4 nodes and that seemed to fix it.

As I see it there are two possible solutions:

  1. Provide guidance in the docs for Windows & macOS podman users on the possible necessity for modifying the max number of ssh connections in their podman machine's sshd config.
  2. Implement rate-limiting logic in the podman provider to ensure no more than 10 concurrent podman commands are executed at any one time. This may result in splash damage to other providers since the concurrency logic is generic AFAICT.
BenTheElder commented 1 month ago

We generally have avoiding making the default experience suffer for supporting remote runtimes, because kind nominally targets local clusters and there are local options that don't have this sort of issue.

I would argue that mitigatin podman-remote's connection limit issues is a feature request, and having a low connection limit seems like a usability issue in this podman install.

I don't want to make log collection slower, and I'm not enthused about attempting to probe for the ssh connection limit.

Even if we did, what if some other process is running concurrently?

1) seems like the best approach, because even if kind tries really hard to deal with this, being limited to a few connections still risks things breaking if any other tool concurrently accesses it ... and this limit is not typical when working with container runtimes.

BenTheElder commented 1 month ago

Something to explore: Is there a good reason to tightly limit the maximum number of sessions, or is this just an arbitrary default? Are we going to introduce problems by telling users to increase it a lot?

wbrefvem commented 1 month ago

It's the default for sshd that podman doesn't touch AFAIK. As long as users aren't exposing their podman VMs to the Internet, I can't imagine any security implications. And with modern hardware I don't foresee any performance hit. The fact that it hasn't come up yet probably means that not many users are running multi-node clusters and then exporting logs, so the chance of a large number of users even trying it out seems low.