piraeusdatastore / piraeus-ha-controller

High Availability Controller for stateful workloads using storage provisioned by Piraeus
Apache License 2.0
15 stars 8 forks source link

failover: force delete pods on unresponsive nodes #16

Closed WanzenBug closed 2 years ago

WanzenBug commented 2 years ago

Eviction always waits for confirmation by the kubelet to complete. This is not ideal if the whole node crashed, as then kubelet can't confirm deletion, and we are stuck with Pods in Terminating state. This is especially bad in case of StatefulSets, as those won't ever recreate the Pod, as the old one is still visible, even if it is factually gone.

The solution is to check the node state in the fail over case, and if kubernetes reports that the node is not ready, we force delete the pod from the API without waiting for any kind of acknowledgement by the kubelet.

Also fixes a small typo

WanzenBug commented 2 years ago

draft, until I could test it.

WanzenBug commented 2 years ago

@rck I noticed some issues along the way, so it might be better to review commit by commit