kubernetes / ingress-nginx

Ingress NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.38k stars 8.23k forks source link

Dynamically Return 204 in Nginx Ingress When Backend Returns 500, and Resume Normal Behavior When Backend Recover #12004

Open umlumpa opened 3 weeks ago

umlumpa commented 3 weeks ago

Problem We are facing an issue where our backend service, behind an Nginx Ingress Controller, occasionally starts returning HTTP 500 errors. When this happens, we notice that our Nginx Ingress Controller (configured with hostNetwork: true) consumes an excessive amount of CPU and memory, often reaching 100%.

To mitigate the load during these failure scenarios, we want Nginx Ingress Controller to automatically return an HTTP 204 status code whenever the backend starts returning 500 errors. The goal is to avoid sending traffic to the backend when it's in a failing state. Once the backend recovers and starts returning HTTP 200, we want Nginx to stop returning 204 and resume forwarding traffic to the backend normally.

Proposed Solution We are looking for a way to implement dynamic behavior in the Nginx Ingress Controller:

When the backend starts returning 500: Nginx should immediately start responding with 204 for all incoming requests to a specific path (e.g., /bid) without forwarding these requests to the backend. When the backend recovers and returns 200: Nginx should remove the 204 rule and resume forwarding traffic to the backend normally. We attempted to find a solution within the current Nginx Ingress annotations but couldn’t find a dynamic mechanism to implement this. A possible solution could be to use Lua scripting in Nginx to track backend responses and adjust the behavior accordingly, but this is not something that can be done with existing Ingress annotations. Is there some solutions?

k8s-ci-robot commented 3 weeks ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
chengjoey commented 3 weeks ago

If liveness is configured for the backend service, when the backend returns 500, the pod will become unhealthy and traffic will not be forwarded to the backend service. Does this solve your problem?

longwuyuan commented 3 weeks ago

/remove-kind bug

longwuyuan commented 3 weeks ago

/kind feature