kubernetes-sigs / karpenter

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
Apache License 2.0
637 stars 206 forks source link

perf: Unregister the topology domain when failing NodeClaim creation #1819

Closed jonathan-innis closed 5 days ago

jonathan-innis commented 5 days ago

Fixes #N/A

Description

Unregister the topology domain when failing to create a NodeClaim so that we don't hold topology domains around for hostname. Prior to this change, we weren't removing a mock domain that we would create for a topology on NewNodeClaim(). As a result, we would continually expand the number of hostname domains, making hostnmae topology spread and hostname anti-affinity way more inefficient than it should have.

Before

Example: Debug Image showing ~53,000 domains after only a few seconds iterating in the scheduling loop

Screenshot 2024-11-16 at 5 12 51 PM

After

Example: Debug Image showing only a single domain after only a few seconds iterating in the scheduling loop

Screenshot 2024-11-16 at 5 17 01 PM

How was this change tested?

make presubmit

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

k8s-ci-robot commented 5 days ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

k8s-ci-robot commented 5 days ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jonathan-innis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/karpenter/blob/main/OWNERS)~~ [jonathan-innis] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
coveralls commented 5 days ago

Pull Request Test Coverage Report for Build 11874714947

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/provisioning/scheduling/topology.go 7 9 77.78%
<!-- Total: 17 19 89.47% -->
Totals Coverage Status
Change from base Build 11874015687: 0.08%
Covered Lines: 8614
Relevant Lines: 10631

💛 - Coveralls