Closed Talinx closed 2 weeks ago
link https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/discussions/1004 (in this case the longhorn-manager
PDB prevents this as long as there are replicas on the node).
I am currently testing if the system-upgrade-controller even needs to perform a drain/cordon - I think an in-place update of the k3s-agent should suffice, without draining all pods.
Would be great if no drain is necessary. Maybe sequentially draining and updating the nodes could also work?
For major updates you definitely want to drain all pods, for patch updates it is surely debatable. I guess a drain by default is definitely needed therefore.
I think #1338 should help here, when setting system_upgrade_enable_eviction=false
PDBs should be ignored, allowing the upgrade process to succeed.
Maybe sequentially draining and updating the nodes could also work?
That should actually already happen - at least for me, only one node at a time is drained. If the pods can be relocated to other nodes, there shouldn't be an issue. Could it be that some nodeSelector/spread restrictions are preventing this? In this case, the pod can't be evicted and the upgrade plan fails.
@Talinx Please try @pat-s' tip above, if you feel confident enough that is.
Description
When pod disruption budgets apply to multiple pods, the system upgrade can fail due to not being able to completely drain a node.
Here is what happens:
The result is a cluster that is stuck wanting to upgrade with nodes that don't allow scheduling pods with as much pods evicted as possible. This effectively results in downtime of the hosted application until manually resolved (unless every critical pod has a PodDisruptionBudget).
This process is a bit random. E. g. if there are 2 worker nodes and after evicting pods until the disruption budget is reached one node has no pods then this node can be upgraded.
(In this case the Elasticsearch helm chart from Bitnami "caused" the problem. However this can happen with everything that introduces enough pod disruption budgets.)
Kube.tf file
Screenshots
Platform
Linux