kubernetes / ingress-nginx

Ingress NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.33k stars 8.22k forks source link

Unexpected failure reloading the backend & Dynamic reconfiguration failed #10204

Closed mike-pt closed 3 weeks ago

mike-pt commented 1 year ago

What happened:

This (see tittle) as happens twice this week, in both cases eventually the controller just started working again, and we have no idea why.

What you expected to happen:

I would expect nginx-controller to just work or to at least provide better details of why it failed.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.): NGINX Ingress controller Release: v1.8.1 Build: dc88dce9ea5e700f3301d16f971fa17c6cfe757d Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.21.6

Kubernetes version (use kubectl version): v1.24.12-gke.500

Environment:

redacted.ode-pool-1c9c7ca5-f52x   Ready    <none>   84m     v1.24.12-gke.500   red.acted.15.57    <none>        Container-Optimized OS from Google   5.10.162+        containerd://1.6.18
redacted.ode-pool-1c9c7ca5-jsf5   Ready    <none>   8m30s   v1.24.12-gke.500   red.acted.0.39     <none>        Container-Optimized OS from Google   5.10.162+        containerd://1.6.18
redacted.ode-pool-1c9c7ca5-tmzq   Ready    <none>   84m     v1.24.12-gke.500   red.acted.15.59    <none>        Container-Optimized OS from Google   5.10.162+        containerd://1.6.18
redacted.ode-pool-1c9c7ca5-zjnl   Ready    <none>   85m     v1.24.12-gke.500   red.acted.0.33     <none>        Container-Optimized OS from Google   5.10.162+        containerd://1.6.18
redacted.ode-pool-4f3dc29f-7p2z   Ready    <none>   84m     v1.24.12-gke.500   red.acted.15.61    <none>        Container-Optimized OS from Google   5.10.162+        containerd://1.6.18
redacted.ode-pool-4f3dc29f-c5ng   Ready    <none>   84m     v1.24.12-gke.500   red.acted.15.192   <none>        Container-Optimized OS from Google   5.10.162+        containerd://1.6.18
redacted.ode-pool-4f3dc29f-hjft   Ready    <none>   8m23s   v1.24.12-gke.500   red.acted.15.12    <none>        Container-Optimized OS from Google   5.10.162+        containerd://1.6.18
redacted.ode-pool-4f3dc29f-lgw7   Ready    <none>   8m25s   v1.24.12-gke.500   red.acted.15.2     <none>        Container-Optimized OS from Google   5.10.162+        containerd://1.6.18
Name:                     redacted-ingress-nginx-controller
Namespace:                default
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=redacted
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/part-of=ingress-nginx
                          app.kubernetes.io/version=1.8.1
                          helm.sh/chart=ingress-nginx-4.7.1
Annotations:              cloud.google.com/neg: {"ingress":true}
                          meta.helm.sh/release-name: redacted
                          meta.helm.sh/release-namespace: default
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=redacted,app.kubernetes.io/name=ingress-nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       redac.ted11.83
IPs:                      redac.ted11.83
IP:                       re.da.ct.ed
LoadBalancer Ingress:     re.da.ct.ed
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  31852/TCP
Endpoints:                reda.cted.2.9:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  30957/TCP
Endpoints:                reda.cted.2.9:443
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason               Age                  From                Message
  ----    ------               ----                 ----                -------
  Normal  UpdatedLoadBalancer  3m3s (x53 over 20d)  service-controller  Updated load balancer with new hosts

How to reproduce this issue:

I'm not sure, has this seems to app out of the blue, and then it simply starts working, here is the log of the controller pod (with some private infor redacted, but all logs there)

To be honest what I would really like to know is if there is a way to get logs to show the actually cause of "Unexpected failure reloading the backend" if that's even possible.. like more verbose logging or such?

nginx-controller.log

k8s-ci-robot commented 1 year ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
longwuyuan commented 1 year ago

hi, Can you add detailed description of the issue

longwuyuan commented 1 year ago

/remove-kind bug

mike-pt commented 1 year ago

Sorry if that was not very detailed, I did provide a full log no sure if you noticed it, as for the issue it self I don't really have much more info that the title it self.

the controller stops with "Unexpected failure reloading the backend" until eventually it just works, we also notice that sometimes it also prints "Dynamic reconfiguration failed".

This is why I posted the full logs, because its not clear why this fails but maybe to someone who has deeper understanding of the controller the logs provide some indication... I also found no way to make this more verbose, would be nice if such an option exists.

github-actions[bot] commented 1 year ago

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

longwuyuan commented 3 weeks ago

Sorry for no action here.

The controlller received a SIGTERM ;

"Received SIGTERM, shutting down"

There are no events or log messages available to guess the events prior to that so the assumption is a environment factor ranging from resource starvation to a security related violation ending in SIGTERM

Unless there is a easy reproduce method that we can copy/paste from on a kind cluster, there is no action we can talke.

Because we can not take any action and this issue has to be closed as it does not track any action item but adds to the tally of open issues. If this is still happening or you need to report any problem, please test on recent release of the controller and provide info that is asked in the template of a new bug report, by editing the issue description here.

/kind suppport /close

k8s-ci-robot commented 3 weeks ago

@longwuyuan: The label(s) kind/suppport cannot be applied, because the repository doesn't have them.

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/10204#issuecomment-2348657445): >Sorry for no action here. > >The controlller received a SIGTERM ; >``` >"Received SIGTERM, shutting down" >``` > >There are no events or log messages available to guess the events prior to that so the assumption is a environment factor ranging from resource starvation to a security related violation ending in SIGTERM > >Unless there is a easy reproduce method that we can copy/paste from on a kind cluster, there is no action we can talke. > >Because we can not take any action and this issue has to be closed as it does not track any action item but adds to the tally of open issues. If this is still happening or you need to report any problem, please test on recent release of the controller and provide info that is asked in the template of a new bug report, by editing the issue description here. > >/kind suppport >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
k8s-ci-robot commented 3 weeks ago

@longwuyuan: Closing this issue.

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/10204#issuecomment-2348657445): >Sorry for no action here. > >The controlller received a SIGTERM ; >``` >"Received SIGTERM, shutting down" >``` > >There are no events or log messages available to guess the events prior to that so the assumption is a environment factor ranging from resource starvation to a security related violation ending in SIGTERM > >Unless there is a easy reproduce method that we can copy/paste from on a kind cluster, there is no action we can talke. > >Because we can not take any action and this issue has to be closed as it does not track any action item but adds to the tally of open issues. If this is still happening or you need to report any problem, please test on recent release of the controller and provide info that is asked in the template of a new bug report, by editing the issue description here. > >/kind suppport >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.