`k8s-rke1-foreman.sh` run 22.08.2024

The first run of the script was not a success:

the scripts were ran by Foreman just fine (see details below)
the checks for the files and the subsequent bi-restart of kubelet either were (1) did not help to resolve the issue (as it was the case for *-controlplane-{woergl,vienna}*) hosts or (2) not necessary at all since the files were already there (rest of the hosts below)
however, upgrading docker is super invasive, an excerpt by @flo-weber:
- the asp-ibk cluster showed as Cluster agent not connected in the rancher UI NB: k8s-dev, which has seen no docker upgrade, was working just fine NB2: the cattle-cluster-agent deployment only runs on controlplane nodes due to its tolerations
- at the same time all monitoring went red, i.e., the actual workload was not working either
- reboot of all controlplane nodes did not bring the cluster back in Rancher
- what eventually helped to fix the cluster was:
- shutting all controlplane and worker nodes
- starting them from scratch and waiting until the workload comes back up again (registry etc.)
none of the worker nodes were accessible by ssh since they were running out of memory (visible by attaching to the Vsphere console where OOM errors are displayed even for not logged-in users
- theory: once the pods get scheduled to a node they, as time goes by, consume more and more memory and since actively running workload is not re-scheduled but only OOM-killed, and many of workload simply does not have proper resource limits set, eventually the whole worker's memory runs out, causing OOM things to kick in; if that's the case, it's usually too late since essential OS things might have started too fail already

Discussion/Thoughts

the script in its current form does not bring any benefit, since a reboot is the way to go
we don't want to reboot if no packages have been updated
we want to reboot as fast as possible if any docker package got updated since the likelihood that the workload is already failing is quite high -> no need to randomly sleep prior to reboot
we are not restarting all controlplane nodes at once, since they are time-wise distributed (23:00 <-> 23:15 <-> 23:30)

asp-ibk

Host	Time	Status	Link	Comment
`asp-ibk-controlplane-ibk1.wd.loc`	23:00	success		[0]
`asp-ibk-worker-ibk1.wd.loc`	23:00	success		[0]
`asp-ibk-worker-ibk2.wd.loc`	23:00	success		[0]
`asp-ibk-worker-ibk3.wd.loc`	23:00	success		[0]
`asp-ibk-worker-ibk4.wd.loc`	23:00	success		[0]
`asp-ibk-controlplane-vienna1.wd.loc`	23:15	failed	https://katello1rz.wd.loc/template_invocations/25671	[0]
`asp-ibk-controlplane-woergl1.wd.loc`	23:30	failed	https://katello1rz.wd.loc/template_invocations/25697	[0]
`asp-ibk-worker-woergl1.wd.loc`	23:30	success		[0]
`asp-ibk-worker-woergl2.wd.loc`	23:30	success		[0]
`asp-ibk-worker-woergl3.wd.loc`	23:30	success		[0]
`asp-ibk-worker-woergl4.wd.loc`	23:30	success		[0]

[0]: They all got the following packages updated:

  Upgraded:
  containerd.io-1.7.20-3.1.el9.x86_64                                           
  docker-buildx-plugin-0.16.2-1.el9.x86_64                                      
  docker-ce-3:27.1.2-1.el9.x86_64                                               
  docker-ce-cli-1:27.1.2-1.el9.x86_64                                           
  docker-ce-rootless-extras-27.1.2-1.el9.x86_64

k8s-dev

Host	Time	Status	Link	Comment
`k8s-dev-controlplane-ibk1.wd.loc`	23:00	success		[1]
`k8s-dev-worker-ibk1.wd.loc`	23:00	success		[1]
`k8s-dev-worker-ibk2.wd.loc`	23:00	success		[1]
`k8s-dev-worker-ibk3.wd.loc`	23:00	success		[1]
`k8s-dev-controlplane-vienna1.wd.loc`	23:15	failed	https://katello1rz.wd.loc/template_invocations/25670	[1]
`k8s-dev-controlplane-woergl1.wd.loc`	23:30	failed	https://katello1rz.wd.loc/template_invocations/25691	[1]
`k8s-dev-worker-woergl1.wd.loc`	23:30	success		[1]
`k8s-dev-worker-woergl2.wd.loc`	23:30	success		[1]
`k8s-dev-worker-woergl3.wd.loc`	23:30	success		[1]

[1]: NO packages were updated (docker was updated a week before)

world-direct / foreman-helpers

`k8s-rke1-foreman.sh` run 22.08.2024 #1

`k8s-rke1-foreman.sh` run 22.08.2024

Discussion/Thoughts

asp-ibk

k8s-dev

world-direct / foreman-helpers

`k8s-rke1-foreman.sh` run 22.08.2024 #1

k8s-rke1-foreman.sh run 22.08.2024

Discussion/Thoughts

asp-ibk

k8s-dev

`k8s-rke1-foreman.sh` run 22.08.2024