kubernetes / ingress-nginx

Ingress NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.33k stars 8.22k forks source link

Ingress returning 503s when using Topology Aware Routing and the controller has no endpoints in the zone #11342

Open LAMRobinson opened 5 months ago

LAMRobinson commented 5 months ago

What happened:

Ingress returns 503 when run in a multi-zone setup where the backend endpointslice doesn't have any endpoints in the same zone as the Ingress Controller

What you expected to happen:

Like kube-proxy, Ingress should send you to a random endpoint as topology hints are meant to be fail open not shut (unlike xTP).

My impression is that all testing/thought about this feature has been assuming people are using the topology-aware-routing:auto which doesn't let you into this situation, but the hints feature is explicitly designed to separate the responsibility of making the decision of enabling topology routing for a service from the responsibility of implementing it, so the implementation of the hints in the dataplane shouldn't make decisions around the assumption of what it thinks is setting them.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

NGINX Ingress controller
  Release:       v1.8.4
  Build:         05adfe3ee56fab8e4aded7ae00eed6630e43b458
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

Note that this is still the current behavior in the latest commit of this repo, see this snippet: https://github.com/kubernetes/ingress-nginx/blob/main/internal/ingress/controller/endpointslices.go#L144

Kubernetes version (use kubectl version): 1.27

Environment: A multi-zone cluster, e.g.:

DC1
  NodeA
  NodeB

DC2
  NodeC
  NodeD

Then:

ingress-nginx-controller-1 NodeA
ingress-nginx-controller-2 NodeB
ingress-nginx-controller-3 NodeC
ingress-nginx-controller-4 NodeD

service-backend-pod-1 NodeD

Ingress 3/4 on NodeC/D populate the endpoint list including pod-1 and work.

Ingress 1/2 on NodeA/B do not populate the endpoint list as pod-1 is marked as in a different zone in the endpointslice

Workaround

Setting service-upstream and delegating the decision to kube-proxy makes this work, as kube-proxy handles this situation properly (sends you to a random endpoint regardless of topology). It would be nice if ingress-nginx handled this though as there are lots of downsides to service-upstream as I'm sure you folks know

k8s-ci-robot commented 5 months ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
longwuyuan commented 5 months ago

The switch to endpointslices was a requirement AFAIK.

cc @tao12345666333

LAMRobinson commented 5 months ago

@longwuyuan - I might have misunderstood you but I'm not saying to not use EndpointSlices, but asking for a change in behaviour about how the Ingress Controller treats endpoints in an EndpointSlice when:

Right now the function returns an empty list of endpoints thus the controller returns a 503. I would like instead for it instead to return the full list of endpoints, in a "fail open" style behaviour, which matches how kube-proxy behaves.

longwuyuan commented 5 months ago

I assume it would be kube-proxy's job to return what it does being what it is. I am not sure if a ingress-controller can or should mimic kube-proxy behaviour. No data on how it impacts ingress-controller use-case of endpointslices.

Please wait for comments from others.

github-actions[bot] commented 4 months ago

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

LAMRobinson commented 1 month ago

Any update here? With the addition of trafficDistribution in 1.30+ this is going to become a more frequent issue as there is now an "in-tree" way of ending up in this state.