kubernetes-sigs / aws-load-balancer-controller

A Kubernetes controller for Elastic Load Balancers
https://kubernetes-sigs.github.io/aws-load-balancer-controller/
Apache License 2.0
3.82k stars 1.41k forks source link

Target group deletion causing inflight request 503's #3739

Open Harif-Rahman opened 2 weeks ago

Harif-Rahman commented 2 weeks ago

Describe the bug In the Ingress configuration, we are redirecting the rules to a new target group or service, and there are no existing rules directing traffic to the old target group or service. However, the target group is being deleted hastily without taking into account the deregistration delay, leading to in-flight requests resulting in 503 errors. Any idea how to handle this ?

Steps to reproduce Logs Deletion of target group logs - 2024-06-11 8:57:06 UTC

{"level":"info","ts":1718096226.6557481,"logger":"controllers.ingress","msg":"deleted targetGroup","arn":"arn:aws:elasticloadbalancing:us-east-1:123:targetgroup/k8s-fcjava-javaapii-79a6c094bb/9cac3a3ffc19a38f"}

below is the 503 access logs from lb. 2024-06-11 8:57:06 arn:aws:elasticloadbalancing:us-east-1:123:targetgroup/k8s-fcjava-javaapii-79a6c094bb/9cac3a3ffc19a38f 22
2024-06-11 8:57:07 arn:aws:elasticloadbalancing:us-east-1:123:targetgroup/k8s-fcjava-javaapii-79a6c094bb/9cac3a3ffc19a38f 112
2024-06-11 8:57:08 arn:aws:elasticloadbalancing:us-east-1:123:targetgroup/k8s-fcjava-javaapii-79a6c094bb/9cac3a3ffc19a38f 125
2024-06-11 8:57:09 arn:aws:elasticloadbalancing:us-east-1:123:targetgroup/k8s-fcjava-javaapii-79a6c094bb/9cac3a3ffc19a38f 85
2024-06-11 8:57:10 arn:aws:elasticloadbalancing:us-east-1:123:targetgroup/k8s-fcjava-javaapii-79a6c094bb/9cac3a3ffc19a38f 115

Expected outcome A concise description of what you expected to happen.

Environment

Additional Context:

oliviassss commented 2 weeks ago

@Harif-Rahman, can you check if adding a sleep as preStop would mitigate your issue, see more details in: https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2366

aravindsagar commented 2 weeks ago

Hi, this seems like a bug with target group deletion. We're trying to reproduce the issue and will work on a fix after that.

Harif-Rahman commented 2 weeks ago

@oliviassss how does adding prestop will help here because alb controller pods deletes the target group immediately after deregisration of the instances.