Closed thangle-grabtaxi closed 2 months ago
This issue is currently awaiting triage.
If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
A disruption budget prescribes what the maximum amount of disruption is allowed, but there are other safeguards that are put in place while drifting nodes. We ensure that replacement nodes are online and healthy before disrupting drifted nodes, which naturally limits the total number of nodes that can be disrupting at once. I'd recommend using the observed rate of disruption you see as a way to understand how long you'd actually want to have your budgets be at 100% to get a full roll of your nodes.
This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.
Description
Observed Behavior: When we make changes to Karpenter node pool, specifically AMI change, with a disruption budget of 100%, we expect Karpenter to rotate all nodes at once due to Drift. However, it only rotate 1 to 4 nodes at a time.
Disruption Budget is set to 100%, which comes out to be around 35-40 nodes.
During allowed disruption period of 10 minutes minutes, Karpenter rotate the instances in sequence, 1 to 4 nodes per batch.
This cause an issue of frequent restart for our applications. We have set specifically 100% disruption budget within a short time frame (10 mins) with the expectation of all nodes will be restarted only once, meaning only one restart for our applications.
Expected Behavior: Karpenter rotate all nodes at once.
Reproduction Steps (Please include YAML):
Versions:
Chart Version: v0.35.5
Kubernetes Version (
kubectl version
): 1.27Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment