kubernetes-sigs / karpenter

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
Apache License 2.0
454 stars 154 forks source link

Remove the Lease garbage collection controller #1373

Open engedaam opened 4 weeks ago

engedaam commented 4 weeks ago

Description

What problem are you trying to solve?

When Karpenter deleted a Node object while kubelet was alive, due to ignoring errors in kubelet's lease logic, it created lease without ownerReference set: https://github.com/kubernetes/kubernetes/issues/109777.

Karpenter used to not wait until the underlaying VM/kubelet to be full terminated prior to removing the karpenter finalizers from the NodeClaim and Node. This resulted in node leases being leaked into the cluster, as the terminating kubelet would create a phantom lease prior to deletion: https://github.com/aws/karpenter-provider-aws/issues/4363. This resulted in Karpenter causing a lease leak effect.

As a mitigation effort the Karpenter team implemented a lease garage collection controller to delete any leaked node leases: https://github.com/kubernetes-sigs/karpenter/pull/471

The team recently moved to waiting for underlaying VMs to be fully terminated prior to removing NodeClaim and Node finalizer, which will eliminate the a terminating kubelet from creating a phantom leases: https://github.com/kubernetes-sigs/karpenter/pull/1195

We will need to validate by waiting for instance termination this will result in Karpenter not leaking node leases.

engedaam commented 4 weeks ago

/triage accepted

jonathan-innis commented 3 weeks ago

We just need to validate that we aren't leaking leases here, right? Through Prom metrics in our soak testing and E2E testing? And then we should be good to confirm and remove this controller?

sftim commented 3 weeks ago

/retitle Remove the Lease garbage collection controller

engedaam commented 3 weeks ago

@jonathan-innis Yeah, the main work needed here is to get some metrics, if any node leases are leaked, to confirm we are okay to remove the controller.