Closed axgkl closed 3 months ago
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
Prevent the cloud init finished detection to be part of the cloud init on autoscaled nodes.
I did it by cutting off all before 'touch /etc/initialized', which is a bit risky regarding changes of that line later, so I also added a check if that line is really in, hard failing if not, so one will see the effect immediately and can adapt.
Good news: We have no(!) problem with the current v1 release - there the waiting for cloud init is NOT in the worker_install_script.sh, which is merged into the cloud init of autoscaled ones. It is only in the master install script. Of course worker creation might fail on non autoscaled workers, w/o the waiting for finished, but in v1 it always was like this.
Affects
src/kubernetes/software/cluster_autoscaler.cr
(won't send the posix compliancy PR, seeing that you integrated it already)
Just tested this and it works, thanks! I think we need to find a more robust way to handle this, like instead of having the cloud init wait part directly in the worker script, which is included in the cloud init, we run it separately as the first script, before the current worker install script, and only for static - not autoscaled - nodes, where we know that the k3s installation has to be done after cloud init is complete. Thanks again :)
right. that wait has to be run only for static non autoscaled nodes. i don't like that split either, that's technical debt, so totally liking the idea of a seperate waiting step, before applying the installer.
I have made the change I mentioned and now the cloud init wait script is run separately. Will push it as rc5 as soon as I have finished another couple of things.
Autoscaled nodes run the worker install script within cloud init itself.
There we can't wait for the file demarking finished, will never happen, since we are not finished -> we are stuck, waiting, deadlocked.