kubernetes-sigs / karpenter

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
Apache License 2.0
541 stars 183 forks source link

ConsolidationPolicy: WhenEmpty #1647

Open jderieg opened 1 week ago

jderieg commented 1 week ago

Description

I originally posted this in Discussions, but it got no traction there, so posting it here. I think it may be a bug because it definitely does not behave as expected.

Observed Behavior: I've been testing the WhenEmpty policy, but it does not seem to be behaving as expected if the consolidateAfter setting is any more than about 2 to 3 minutes. My disruption settings look like this:

    disruption:
      consolidationPolicy: WhenEmpty
      consolidateAfter: 10m
      expireAfter: 360h

As a test, I scale up a deployment to a large number of pods in my nodegroup so that Karpenter spins up a new node. That works fine. When I scale the deployment back down to 0, I would expect Karpenter to scale down (remove) the Karpenter node after 10m of that deployment no longer needing it. That never happens. I've let it sit for over 24 hours and it never removes the node, even though there aren't anymore workloads added to the node to keep it alive. The strange thing is, if I set that consolidateAfter value to 2m or under, it works as I would expect, and removes the node. I'm running Karpenter v0.37.

Expected Behavior: Consolidate the node(s) after the time specified in 'consolidateAfter'

Reproduction Steps (Please include YAML):

    disruption:
      consolidationPolicy: WhenEmpty
      consolidateAfter: 10m
      expireAfter: 360h

Versions:

leoryu commented 1 week ago

What dose your budgets look like? Please set the budgets to 100% to make sure all your nodes cloud be consolidated:

disruption:
    budgets:
    - nodes: 100%
    consolidateAfter: 10m
    consolidationPolicy: WhenEmptyOrUnderutilized
jonathan-innis commented 2 days ago

That never happens. I've let it sit for over 24 hours and it never removes the node, even though there aren't anymore workloads added to the node to keep it alive

Can you share the spec/status of the node when it was left around for 24h? There's a couple fields lastPodEventTime and the conditions block that should give us a little more info. Karpenter will add a Consolidatable status condition after the pod has surpassed its consolidateAfter. If that doesn't get added, that means that the lastPodEventTime is too close.

If that's not the behavior and the lastPodEventTime has truly surpassed your consolidateAfter, then yeah, that definitely seems like a bug.

jonathan-innis commented 2 days ago

/triage accepted

jonathan-innis commented 2 days ago

/triage needs-information