vitobotta / hetzner-k3s

The easiest and fastest way to create and manage Kubernetes clusters in Hetzner Cloud using the lightweight distribution k3s by Rancher.
MIT License
1.91k stars 141 forks source link

Fix SSH when using non-default port #391

Closed jpetazzo closed 3 months ago

jpetazzo commented 3 months ago

Since Ubuntu 22.10, SSH uses socket activation. This means that the SSH server isn't running by default, and gets started automatically the first time there is a connection on port 22.

Upside: it saves 3 MB of RAM.

Downside: if you're customizing the SSH port number with a drop-in configuration file at cloud-init time, it breaks SSH. SSH then needs to be restarted (or the machine needs to be rebooted) for the port number to be picked up by the systemd generator. The net result for hetzner-k3s is that if we use a non-default SSH port, provisioning breaks.

This manifests itself by the following log message:

  systemd[1]: ssh.socket: Socket unit configuration has changed while
  unit has been running, no open socket file descriptor left. The
  socket unit is not functional until restarted.

One possible fix is to disable socket activation and revert to the default mode (start SSH at boot). This can be done by disabling ssh.socket and enabling ssh.service.

This patch does exactly that, adding the corresponding commands to the cloud-init template. This has been tested with Ubuntu 24.04 and 22.04 as well.

The following link has more details on the Ubuntu change:

https://discourse.ubuntu.com/t/sshd-now-uses-socket-based-activation-ubuntu-22-10-and-later/30189/14

vitobotta commented 3 months ago

Thanks for this! I had it on my list to investigate implications with hetzner-k3s since I set up a couple of Ubuntu 24.04 servers and had to make some unusual (for me) adjustments to the config in order to be able to customize the SSH port.

Before I merge, are you positive that these commands you have added also work on older versions of Debian distros as well as different types of distros? Thanks.

jpetazzo commented 3 months ago

I've tried with ubuntu-22.04 but that's pretty much it :-)

I'll try with debian-11 and debian-12 and maybe even run a script to try with every distro that Hetzner offers and report back.

I think it should work (worst case scenario if these commands fail they shouldn't have an impact) but we'll make sure.

vitobotta commented 3 months ago

I don't know how many people use something other than the default OS but it would be safer to test with other distros as well, not just debian based just in case. It's awesome if you can check with those too, thanks! :)

jpetazzo commented 3 months ago

Alright, I tested with all the distro images offered by Hetzner, and this patch works with all of them (I can create a cluster using SSH port 222) except alma-8, rocky-8, and centos-stream-8; but these distros don't work with hetzner-k3s in the first place anyway.

Would you like a PR for the (very crude) test harness that I wrote to try many configuration combinations? :)

vitobotta commented 3 months ago

Alright, I tested with all the distro images offered by Hetzner, and this patch works with all of them (I can create a cluster using SSH port 222) except alma-8, rocky-8, and centos-stream-8; but these distros don't work with hetzner-k3s in the first place anyway.

Would you like a PR for the (very crude) test harness that I wrote to try many configuration combinations? :)

Weird, I hadn't noticed this other message. I think you can add those changes to this PR since they are related. Thanks

jpetazzo commented 3 months ago

Hi!

I've updated the PR with the changes that you had suggested + the test harness that I mentioned. I wrote a small README for the test harness; happy to expand it if needed.

Cheers!

sonarcloud[bot] commented 3 months ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

vitobotta commented 3 months ago

Thanks a lot for the additional changes!

vitobotta commented 3 months ago

I am actually having problems with this change. For some reason with those lines in place creating a cluster hangs randomly, while it doesn't seem to happen without. Any suggestions?

vitobotta commented 3 months ago

After some investigation I got better results by just changing the port in the socket activation service. See https://github.com/vitobotta/hetzner-k3s/blob/more-refactoring/templates/cloud_init.yaml

jpetazzo commented 3 months ago

Ah sorry for not seeing your message earlier!

Weird; out of curiosity, was it hanging randomly with the default port or when changing the port number?

(I'll see if I can reproduce the issue since I feel like it'll teach me a thing or two about cloud-init 😅)

vitobotta commented 3 months ago

It was really weird, felt like some kind of race condition or something because sometimes it was working just fine, some other times ssh connections would just stop working. I am not sure why. But those issues went away with the changes I made to just change the port number in the socket related config. I would be curious to know what was the reason for that issue but I didn't spend much time on it once I figured out that setting the port made things "more stable".