Open jderieg opened 1 week ago
What dose your budgets
look like? Please set the budgets to 100% to make sure all your nodes cloud be consolidated:
disruption:
budgets:
- nodes: 100%
consolidateAfter: 10m
consolidationPolicy: WhenEmptyOrUnderutilized
That never happens. I've let it sit for over 24 hours and it never removes the node, even though there aren't anymore workloads added to the node to keep it alive
Can you share the spec/status of the node when it was left around for 24h? There's a couple fields lastPodEventTime
and the conditions
block that should give us a little more info. Karpenter will add a Consolidatable
status condition after the pod has surpassed its consolidateAfter
. If that doesn't get added, that means that the lastPodEventTime
is too close.
If that's not the behavior and the lastPodEventTime
has truly surpassed your consolidateAfter
, then yeah, that definitely seems like a bug.
/triage accepted
/triage needs-information
Description
I originally posted this in Discussions, but it got no traction there, so posting it here. I think it may be a bug because it definitely does not behave as expected.
Observed Behavior: I've been testing the WhenEmpty policy, but it does not seem to be behaving as expected if the consolidateAfter setting is any more than about 2 to 3 minutes. My disruption settings look like this:
As a test, I scale up a deployment to a large number of pods in my nodegroup so that Karpenter spins up a new node. That works fine. When I scale the deployment back down to 0, I would expect Karpenter to scale down (remove) the Karpenter node after 10m of that deployment no longer needing it. That never happens. I've let it sit for over 24 hours and it never removes the node, even though there aren't anymore workloads added to the node to keep it alive. The strange thing is, if I set that consolidateAfter value to 2m or under, it works as I would expect, and removes the node. I'm running Karpenter v0.37.
Expected Behavior: Consolidate the node(s) after the time specified in 'consolidateAfter'
Reproduction Steps (Please include YAML):
Versions:
Chart Version: 0.37.0
Kubernetes Version (
kubectl version
): 1.29Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment