vitobotta / hetzner-k3s

The easiest and fastest way to create and manage Kubernetes clusters in Hetzner Cloud using the lightweight distribution k3s by Rancher.
MIT License
1.91k stars 141 forks source link

fix: Filter cloud init wait #405

Closed axgkl closed 3 months ago

axgkl commented 3 months ago

Autoscaled nodes run the worker install script within cloud init itself.

There we can't wait for the file demarking finished, will never happen, since we are not finished -> we are stuck, waiting, deadlocked.

sonarcloud[bot] commented 3 months ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

axgkl commented 3 months ago

Prevent the cloud init finished detection to be part of the cloud init on autoscaled nodes.

I did it by cutting off all before 'touch /etc/initialized', which is a bit risky regarding changes of that line later, so I also added a check if that line is really in, hard failing if not, so one will see the effect immediately and can adapt.

Good news: We have no(!) problem with the current v1 release - there the waiting for cloud init is NOT in the worker_install_script.sh, which is merged into the cloud init of autoscaled ones. It is only in the master install script. Of course worker creation might fail on non autoscaled workers, w/o the waiting for finished, but in v1 it always was like this.

Affects

src/kubernetes/software/cluster_autoscaler.cr

Ref https://github.com/vitobotta/hetzner-k3s/pull/394

axgkl commented 3 months ago

(won't send the posix compliancy PR, seeing that you integrated it already)

vitobotta commented 3 months ago

Just tested this and it works, thanks! I think we need to find a more robust way to handle this, like instead of having the cloud init wait part directly in the worker script, which is included in the cloud init, we run it separately as the first script, before the current worker install script, and only for static - not autoscaled - nodes, where we know that the k3s installation has to be done after cloud init is complete. Thanks again :)

axgkl commented 3 months ago

right. that wait has to be run only for static non autoscaled nodes. i don't like that split either, that's technical debt, so totally liking the idea of a seperate waiting step, before applying the installer.

vitobotta commented 3 months ago

I have made the change I mentioned and now the cloud init wait script is run separately. Will push it as rc5 as soon as I have finished another couple of things.