kubernetes-sigs / karpenter

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
Apache License 2.0
637 stars 206 forks source link

fix: increase the eviction cost baseline value for do-not-disruptable pods #1734

Closed cnmcavoy closed 1 month ago

cnmcavoy commented 1 month ago

Fixes #N/A

Description Karpenter manages the order for disruption based on the total disruption cost of the pods on a node. However, this does not take into account pods Karpenter can not evict due to the karpenter.sh/do-not-disrupt: true. As a result, nodes that can't have all their pods evicted may be selected as candidates for disruption before nodes with equal or greater number of pods that are all evictable. When this occur, the unevictable node wastes the nodepool's disruption budget.

This change fixes that bug by increasing the base pod eviction cost for pods annotated with karpenter.sh/do-not-disrupt: true in Karpenter's simulation models.

How was this change tested?

Unit tests + real workloads in our clusters.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

k8s-ci-robot commented 1 month ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cnmcavoy Once this PR has been reviewed and has the lgtm label, please assign maciekpytel for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/kubernetes-sigs/karpenter/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
njtran commented 1 month ago

Does this PR actually change anything behavior wise?

The only disruption method that cares about disruption cost is Consolidation, but we don't consider pods with do-not-disrupt for consolidation, only for Drift with TGP enabled. https://github.com/kubernetes-sigs/karpenter/blob/main/pkg/controllers/disruption/types.go#L102, so this code path gets exercised, but doesn't actually change anything for drift or consolidation.

cnmcavoy commented 1 month ago

The only disruption method that cares about disruption cost is Consolidation, but we don't consider pods with do-not-disrupt for consolidation, only for Drift with TGP enabled. https://github.com/kubernetes-sigs/karpenter/blob/main/pkg/controllers/disruption/types.go#L102, so this code path gets exercised, but doesn't actually change anything for drift or consolidation.

Huh. I opened this PR because we have been testing out various tweaks to Karpenter and I thought this one had an empirical impact from some of our metrics.

I added some more logging today and it looks like the it bears out the code you linked, the change here only affects the eventual disruption class, which doesn't sort candidates based on the disruption cost anyway. Which agrees with your point. So I'm not sure why we saw an improvement with this, possibly it was coincidental noise.