kubernetes-sigs / karpenter

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
Apache License 2.0
532 stars 174 forks source link

Exponential / logarithmic decay for cluster desired size #696

Open sftim opened 1 year ago

sftim commented 1 year ago

Tell us about your request

When Karpenter is running more node capacity than the cluster requires, use an exponential decay (ie, something with a half life) rather than dropping desired capacity instantly.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

As a cluster operator, When my workloads scale in on my cluster I want to preserve capacity So that short-term drops in workload scale don't disrupt service.

I'm suggesting exponential decay because it's easy to implement with two fields (eg: in the .status of each Provisioner)

  1. the most recent, post-decay, value
  2. a timestamp for that value. Either with subsecond precision, or scale the value to match the timestamp at the beginning of a second

With some not so complex math, you can then evaluate the decayed value for any subsequent instant. You can write it back into the status (eg using a JSON Patch) and you can act on it as well.

This might better support:

Alternative

rather than exponential decay, use another function such as logarithmic decay. That would hold the instance count for a duration and then let it drop off. That might better fit cases where cluster operators want to minimize instance terminations.

Are you currently working around this issue?

(eg) scaleDown policies on HorizontalPodAutoscaler. However, these affect single workloads. A correlated scale-in could still take away node capacity that I, as a cluster operator, know will take time to reprovision if needed.

Additional Context

Also see https://kubernetes.slack.com/archives/C02SFFZSA2K/p1685980025031979?thread_ts=1685960637.488689&cid=C02SFFZSA2K

Attachments

No response

Community Note

sftim commented 12 months ago

https://github.com/aws/karpenter-core/issues/735 adds a user story relevant to this: minimizing the AWS Config costs from frequent provisioning / termination cycles for EC2 instances.

njtran commented 8 months ago

Thinking about this in the perspective of disruption budgets: could this be implemented by a budget with a percentage?

Let's say I had 1000 nodes in my cluster, and let's say they're all empty, meaning that the desired state would be scale to 0. With a disruption budget of 10%, you could achieve the same logarithmic decay, by effectively scaling down the cluster in progressively smaller batches, eventually scaling down to 0.

1000 ( - 100) -> 900 ( - 90) -> 810 ( - 81) -> 729 ( - 73) -> 656 (-66) -> 590 -> ... -> 0

This effectively solves the problem of exponential decay, in my eyes. @sftim thoughts?

One consideration is that this drifts from perfectly exponential the more heterogenous the instance sizes are. Yet, the super nice part is that this effectively gets solved for free with an already existing design/implementation in progress.

sftim commented 8 months ago

There's two shapes for decay. For scale-in, these are:

I actually think the second case is more relevant. People want to keep nodes around in case the load comes back, but eventually they still want their monthly bill to go down.

On the node size thing, we could implement this where you specify the dimension you care about. For example, decay the total vCPU count for a NodePool. Or the node count, or the memory total. Maybe even the Pod capacity?

sftim commented 8 months ago

/retitle Exponential / logarithmic decay for cluster desired size

If we plan to implement just one of these, that could turn into a separate more specific issue.

njtran commented 8 months ago

On the node size thing, we could implement this where you specify the dimension you care about. For example, decay the total vCPU count for a NodePool. Or the node count, or the memory total. Maybe even the Pod capacity?

This totally makes sense. There was some feedback that DisruptionBudgets should refer to more than just nodes, which seems super similar to this request.

big steps first, then smaller and smaller steps (exponential) small reductions at first, then bigger and bigger steps (logarithmic)

I understand the use case in doing big steps first with progressively smaller steps, and that's naturally implemented with a budgets.

What's the use-case for doing smaller steps with progressively larger steps? That sounds like it would be something like doing 1000 -> 999 -> 997 -> 993 -> 985 -> 969 -> 937 -> 873 -> 745 -> 489 -> 0. While not impossible, I think this would be harder to model, since you have to be aware of previous steps to know the next step.

sftim commented 8 months ago

Let's do the simpler thing then, with exponential decay.

sftim commented 8 months ago

scaling down the cluster in progressively smaller batches, eventually scaling down to 0

I do think it's nicer to scale in without the jaggedness this implies. Each time the desired size drops below the integer actual count of nodes, I think a cluster operator would hope to see a drain happening - and eventually an instance termination.

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Bryce-Soghigian commented 4 months ago

/remove-lifecycle rotten

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 3 weeks ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten