kubernetes-sigs / karpenter

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
Apache License 2.0
637 stars 206 forks source link

Add consolidationPolicy: Underweight #1829

Open koreyGambill opened 3 days ago

koreyGambill commented 3 days ago

Description

What problem are you trying to solve? We've created fallback on-demand NodePools with lower scheduling weight than our spot instance NodePools (AWS). When spot instances are hard to find, Karpenter schedules our (fallback) on-demand ec2, but it never consolidates back to the spot instances so it ends up being really expensive. I would love an official setting that allows Karpenter to consolidate based on weighted preferences rather than just utilization.

In this feature, if all the pods on a low-weight node are compatible with a higher-weight node, Karpenter should work to create the higher-weight node and re-schedule the pods. For us, it would help reduce costs, but in general it makes sense that users would care about using higher weighted nodes. I would expect this to still obey the consolidateAfter setting.

Something like this could work in the yaml

disruption:
    # Changed to a list type for the purpose of clearer yaml now that there are 3 options
    consolidationPolicy: 
      # If Underutilized and Underweight are set, Karpenter will re-schedule
      # the node if some pods can be put on a higher weight node, and the 
      # rest could fit on other existing nodes of the same weight
      - Empty
      - Underutilized
      - Underweight  # This would allow consolidating if all pods could be put on a higher weight node

How important is this feature to you? Low-Medium - I have a workaround (setting the on-demand NodePool to expire after 4hrs), but it has a couple drawbacks.

  1. We are waiting up to 4hrs to get back to our optimal state
  2. We cannot re-use the NodePool for workloads that need a long lifespan. With this feature it would be possible since Karpenter wouldn't be able to reconsolidate the node if those pods were on it (due to taints/tolerations/affinities).
k8s-ci-robot commented 3 days ago

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.