kubernetes / ingress-nginx

Ingress NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.57k stars 8.27k forks source link

ingress nginx controller keeps routing to old endpoint resulting in intermittent timeouts #11562

Open vchan2002 opened 4 months ago

vchan2002 commented 4 months ago

What happened:

Intermittent Upstream timed out when nginx is trying to talk to its downstream service. On those errors, the downstream URL it tries to use is the same.... even after recycling the downstream service/pods... So it seems that it stubbornly keeps trying to forward the requests to an old pod that's likely terminated due to a deployment.

84787 upstream timed out (110: Operation timed out) while connecting to upstream, ${URL} is always the same...

The only way to make nginx "forget" that old upstream URL is to drain/delete the node that the previous pod/IP address is assigned to...

What you expected to happen:

After a new deployment with a ingress-nginx ingress, we expect the nginx controller to know what the new downstreams are and reconfig itself accordingly.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.): NGINX Ingress controller Release: v1.9.1 Build: 3538107c077f1bd860d448e19f44fc8e6a2729e1 Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.21.6

Kubernetes version (use kubectl version):

v1.28.9-eks-036c24b

Environment:

k8s-ci-robot commented 4 months ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
longwuyuan commented 4 months ago

@vchan2002 the info you provided can not be analyzed. Your issue description can just be noted as your observation.

If the kubelet of a node does not update the api-server about a pod going away, then the controller also can not update its own endpointslice.

/remove-kind bug /kind support /triage needs-informtion

k8s-ci-robot commented 4 months ago

@longwuyuan: The label(s) triage/needs-informtion cannot be applied, because the repository doesn't have them.

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/11562#issuecomment-2211583708): >@vchan2002 the info you provided can not be analyzed. Your issue description can just be noted as your observation. > >- Click the button to create a new bug report and look at the questions asked in the template there. >- Edit this issue description and provide the answers to those questions. That will be data that readers can analyze. >- Ensure that your issue description is formatted as per markdown. > >If the kubelet of a node does not update the api-server about a pod going away, then the controller also can not update its own endpointslice. > >/remove-kind bug >/kind support >/triage needs-informtion Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
longwuyuan commented 4 months ago

/triage needs-information

vchan2002 commented 4 months ago

So, while I am trying to gather some info, I do have to ask....

What can be a cause that would make kubelet not communicate that a pod went away?

This has happened to one of our specific environment, in one specific ingress, twice in the past two weeks.... So it's not incidental in any way..... It just seems very odd that this is happening like this....

github-actions[bot] commented 3 months ago

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.