NLBs not deleted when the corresponding k8s services are deleted

menghanl commented 1 month ago

Describe the bug

I deleted an unused k8s service.
There's an NLB associated with this service. It was created by aws-load-balancer-controller. But this NLB is not deleted.
This NLB now has one listener pointing to a target group with 0 targets
In the k8s cluster, there's also a TargetGroupBinding with backend not found: Service "the-svc-got-deleted" not found

Steps to reproduce The immediately action that resulted in this unexpected behavior was just to delete the k8s service. (We use helm, if that's important. We just deleted the service.yaml template, and re-deploy helm)

One thing that I'm not sure if it's important is, the NLB is pretty old (Created "October 27, 2023, 12:07 (UTC-07:00)"). So we've probably upgraded the aws-load-balancer-controller several times since.

Expected outcome The aws resources (NLB, target group, etc) get deleted when the k8s service is deleted

Environment

AWS Load Balancer controller version: public.ecr.aws/eks/aws-load-balancer-controller:v2.8.2
Kubernetes version: Server Version: v1.28.12-eks-2f46c53
Using EKS (yes/no), if so version?: yes, 1.28

Additional Context:

As mentioned above. The leaked NLB is pretty old, and was probably created by a different version of aws-load-balancer-controller. Not sure if that's related or not.

menghanl commented 1 month ago

This is hard to reproduce.

We have another cluster with a very similar setup (dev vs staging). I just tried the same thing in the other cluster, but this time the NLBs got successfully deleted.

The only difference is, before I deployed to the second cluster, I updated to the aws-load-balancer-controller deployment to turn on debugging logs. This triggered a restart of the aws-load-balancer-controller pods.

zac-nixon commented 1 month ago

/kind bug

zac-nixon commented 1 month ago

Hello! Thanks for reaching out.

It is odd that your load balancer was leaked. Our current thinking is that the needed load balancer tags were removed (somehow) which causes this leakage. As you mentioned this seems to be hard to reproduce but we can give it a try. Do you have the version of the load balancer controller was used to create the NLB initially? Having this data would make the reproduction easier.

menghanl commented 1 month ago

Sorry, I don't have the old load balancer controller version we used... But the leaked LB was created "October 27, 2023, 12:07 (UTC-07:00)". So I would assume we used the "lastest" aws-load-balancer-controller at that date.

kubernetes-sigs / aws-load-balancer-controller

NLBs not deleted when the corresponding k8s services are deleted #3841