Open michaelmdresser opened 2 years ago
Ok, when I wrote the drain functionality, I referenced kubectl source code for cordoning a node. My guess is that the code just needs updating to support the newer kubernetes features/functionality. For reference, here's the latest kubectl drain
code that would relevant for this fix (hint: it looks like there were some changes to handle a magnitude more error conditions).
A related problem, which explains why drain loops forever instead of timing out like its supposed to (nice find, Bolt):
I think the solution here is two-fold:
kubectl
repo link abovecontext
I believe the globalTimeout
addition was based on the older kubectl
evict logic that didn't work properly. At least, that's what I'm going to tell myself 😞
This issue has been marked as stale because it has not had recent activity. It will be closed if no further action occurs.
This still feels relevant, if nothing else as documentation
Description
On non-autoscaling clusters where eviction is available, cluster-turndown attempts to evict Pods as part of the "Drain" process. After draining is finished, the node pool is supposed to be scaled down. If a PDB exists in the cluster with a minReplicas > 0, there will be at least one un-evictable pod, meaning draining will never finish.
The eviction logic has an infinite loop which continuously retries eviction that fails with a non-nil, non-IsNotFound, or non-IsTooManyRequests error status. I added a log statement and got this error on my dev cluster from the
PolicyV1beta1().Evictions().Evict()
call:Cannot evict pod as it would violate the pod's disruption budget
is the error being returned from.Evict()
.Reproduce
Create GKE cluster
Create a deployment with a non-zero minReplicas PDB
Put turndown in the cluster
Create a turndown schedule that will trigger soon
Wait for turndown to start and finish. Note in the logs that at least one node never finishes draining.
Note in “kubectl get pods” that 2 of the deployment pods are still running and one is unschedulable.
Note in “kubectl get nodes” that after scaleup should have happened, we still have a turndown node and the 3 regular nodes are sitting around marked as noschedule.
Possible solutions
At the very least, we should have a retry limit on evictions so there isn't an infinite loop that makes the turndown pod hang.
Real solutions could involve some sort of "force deletion" or notifying the user of the PDB's presence and asking them to make a modification.