Open menghanl opened 1 month ago
This is hard to reproduce.
We have another cluster with a very similar setup (dev vs staging). I just tried the same thing in the other cluster, but this time the NLBs got successfully deleted.
The only difference is, before I deployed to the second cluster, I updated to the aws-load-balancer-controller deployment to turn on debugging logs. This triggered a restart of the aws-load-balancer-controller pods.
/kind bug
Hello! Thanks for reaching out.
It is odd that your load balancer was leaked. Our current thinking is that the needed load balancer tags were removed (somehow) which causes this leakage. As you mentioned this seems to be hard to reproduce but we can give it a try. Do you have the version of the load balancer controller was used to create the NLB initially? Having this data would make the reproduction easier.
Sorry, I don't have the old load balancer controller version we used...
But the leaked LB was created "October 27, 2023, 12:07 (UTC-07:00)"
.
So I would assume we used the "lastest" aws-load-balancer-controller at that date.
Describe the bug
backend not found: Service "the-svc-got-deleted" not found
Steps to reproduce The immediately action that resulted in this unexpected behavior was just to delete the k8s service. (We use helm, if that's important. We just deleted the service.yaml template, and re-deploy helm)
One thing that I'm not sure if it's important is, the NLB is pretty old (Created "October 27, 2023, 12:07 (UTC-07:00)"). So we've probably upgraded the aws-load-balancer-controller several times since.
Expected outcome The aws resources (NLB, target group, etc) get deleted when the k8s service is deleted
Environment
public.ecr.aws/eks/aws-load-balancer-controller:v2.8.2
Server Version: v1.28.12-eks-2f46c53
1.28
Additional Context:
As mentioned above. The leaked NLB is pretty old, and was probably created by a different version of aws-load-balancer-controller. Not sure if that's related or not.