Closed databonanza closed 2 years ago
do you have PodDisruptionBudgets sets for your app? are you running managed nodegroups?
managed nodegroups will drain the node properly when PodDisruptionBudgets are set
The problem is that the node isn't deregistered from the NLB prior to it's ec2 instance being shut down. We are not having an issue with our application pods draining from the nodes. Each node that ends up "going away" drains to 3 pods only which are all related to the k8s infrastructure. Kube-proxy is one of them and I think this is the root of the issue. Kube-proxy being running within the cluster when it's ec2 instance is suddenly shut off. The NLB should stop sending traffic to the node prior to the ec2 instance stopping.
I hope this helps clear up what the problem we're experiencing is.
What is happening: Node is detected as uneeded by CA. CA waits 10m until it really starts the necessary steps to remove the nodes CA drains the node of all pods except daemonsets (which includes kube-proxy) Node keeps receiving requests because kube-proxy is still there CA after drain does an ec2 terminate-instance All remaining pods are abruptly stopped because the instance is terminated LB takes about ~3m to detect the node is down and remove it from balance
What is supposed to happen: Node is detected as uneeded by CA. CA waits 10m until it really starts the necessary steps to remove the nodes CA adds node.kubernetes.io/exclude-from-external-load-balancers to node label This makes the LB deregister the node gracefully CA waits for Node to be deregistered CA drains the node of all pods except daemonsets (which includes kube-proxy) CA after drain does an ec2 terminate-instance
Anyone? This seems to be a long standing issue with cluster-autoscaler after reading through the other bug reports.
any update?
We had the same issue but we managed to solve this some extent by implementing following workarounds.
by implementing above workarounds, we were managed to bring down the 504 errors to 0.
same here, and if the problem is kube-proxy is still receiving traffic when node shut down ,can we change the cluster-mode to ip-mode to send traffic to Pods directly ?
Node keeps receiving requests because kube-proxy is still there
I think even if the kube-proxy is still there, if keep the externaltraffic as default(cluster), as there are no other application pods running on the pod. It should forward the request to other nodes. Have you try to set a prestop for the pod?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
any update?We also have this issue on GCP that the ILB couldn't aware of the removal of the node in time.
https://github.com/matti/k8s-prestop-sidecar
I wrote this, would this help?
any update? please solve this problem now
same issue: https://github.com/kubernetes/autoscaler/issues/6679
@databonanza do you have any solution? Problem still exists for you?
@databonanza do you have any solution? Problem still exists for you?
I do not. My team fixed the issue or worked around it and it's been so long since this happened that I do not recall what was done to resolve. My suspicion is that they worked around the issue vs. fixing it. We would have submitted a bug fix if we knew what the proper solution was.
I find it ridiculous, however, that such an issue can persist for so long without any support from the k8s community.
[RECOMMENDATION - AutoScaling Group Termination Lifecycle Hook]
Amazon EC2 Auto Scaling offers the ability to add lifecycle hooks to your Auto Scaling groups
These hooks let you create solutions that are aware of events in the Auto Scaling instance lifecycle, and then perform a custom action on instances when the corresponding lifecycle event occurs. A lifecycle hook provides a specified amount of time (one hour by default) to wait for the action to complete before the instance transitions to the next state.
In the event of a scale down like the one observed in your cluster, The lifecycle hook puts the instance into a wait state (Terminating:Wait)
The instance remains in a wait state either until you complete the lifecycle action, or until the timeout period ends (one hour by default). After you complete the lifecycle hook or the timeout period expires, the instance transitions to the next state (Terminating:Proceed) where the instance is terminated.
In your cluster's case, I recommend setting the timeout period to approximately 10 mins or less, depending on the amount of time that would be needed to make sure the node is successfully drained before termination.
[1] How lifecycle hooks work in Auto Scaling groups - https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks-overview.html
[2] Amazon EC2 Auto Scaling lifecycle hooks - https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html
Which component are you using?: cluster-autoscaler
What version of the component are you using?: 9.11.0
What k8s version are you using (
kubectl version
)?: v1.18.20-eks-8c49e2What environment is this in?: AWS
What did you expect to happen?: We expect that when we reduce replicas for an application that causes a node scale down for that to not cause any downtime as cluster-autoscaler tells AWS to shut off ec2 nodes that are still running kube-proxy (and receiving traffic).
What happened instead?: The application returns 504 errors when the nodes are removed even though the pods running the application have already been moved to the nodes that are staying up for over 10 minutes.
How to reproduce it (as minimally and precisely as possible): Scale up by setting replicas to a high number (150) and then scale back down to a low number (50). Monitor using a load testing tool (50 concurrent users) while the different stages of the scale down occur.
Anything else we need to know?: I believe this is a bug in how cluster-autoscaler is notifying AWS to shut off nodes. It should tell AWS to drain connections to the nodes prior to removing the node completely. This "feels" like it's just telling AWS to shut the node off even though kube-proxy is still running on it.... thus causing 504s for a short period of time.