Unexpected failure reloading the backend & Dynamic reconfiguration failed

mike-pt commented 1 year ago

What happened:

This (see tittle) as happens twice this week, in both cases eventually the controller just started working again, and we have no idea why.

What you expected to happen:

I would expect nginx-controller to just work or to at least provide better details of why it failed.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.): NGINX Ingress controller Release: v1.8.1 Build: dc88dce9ea5e700f3301d16f971fa17c6cfe757d Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.21.6

Kubernetes version (use kubectl version): v1.24.12-gke.500

Environment:

Cloud provider or hardware configuration: Google Cloud

Basic cluster related info:

kubectl version

Client Version: v1.27.2
Kustomize Version: v5.0.1
Server Version: v1.24.12-gke.500

kubectl get nodes -o wide

redacted.ode-pool-1c9c7ca5-f52x   Ready    <none>   84m     v1.24.12-gke.500   red.acted.15.57    <none>        Container-Optimized OS from Google   5.10.162+        containerd://1.6.18
redacted.ode-pool-1c9c7ca5-jsf5   Ready    <none>   8m30s   v1.24.12-gke.500   red.acted.0.39     <none>        Container-Optimized OS from Google   5.10.162+        containerd://1.6.18
redacted.ode-pool-1c9c7ca5-tmzq   Ready    <none>   84m     v1.24.12-gke.500   red.acted.15.59    <none>        Container-Optimized OS from Google   5.10.162+        containerd://1.6.18
redacted.ode-pool-1c9c7ca5-zjnl   Ready    <none>   85m     v1.24.12-gke.500   red.acted.0.33     <none>        Container-Optimized OS from Google   5.10.162+        containerd://1.6.18
redacted.ode-pool-4f3dc29f-7p2z   Ready    <none>   84m     v1.24.12-gke.500   red.acted.15.61    <none>        Container-Optimized OS from Google   5.10.162+        containerd://1.6.18
redacted.ode-pool-4f3dc29f-c5ng   Ready    <none>   84m     v1.24.12-gke.500   red.acted.15.192   <none>        Container-Optimized OS from Google   5.10.162+        containerd://1.6.18
redacted.ode-pool-4f3dc29f-hjft   Ready    <none>   8m23s   v1.24.12-gke.500   red.acted.15.12    <none>        Container-Optimized OS from Google   5.10.162+        containerd://1.6.18
redacted.ode-pool-4f3dc29f-lgw7   Ready    <none>   8m25s   v1.24.12-gke.500   red.acted.15.2     <none>        Container-Optimized OS from Google   5.10.162+        containerd://1.6.18

How was the ingress-nginx-controller installed: I can not paste all the values do to privacy reasons, but we aren't touching any deafaults for the controller, and we simle add the chart as a dependency then set ingress resources.

Current State of the controller:

kubectl describe ingressclasses

kubectl describe ingressclasses
Name:         nginx
Labels:       app.kubernetes.io/component=controller
      app.kubernetes.io/instance=****
      app.kubernetes.io/managed-by=Helm
      app.kubernetes.io/name=ingress-nginx
      app.kubernetes.io/part-of=ingress-nginx
      app.kubernetes.io/version=1.8.1
      helm.sh/chart=ingress-nginx-4.7.1
Annotations:  meta.helm.sh/release-name:  *****
      meta.helm.sh/release-namespace: default
Controller:   k8s.io/ingress-nginx
Events:       <none>

kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>

Name:             redacted-ingress-nginx-controller-6dc7cbddcd-5pwrt
Namespace:        default
Priority:         0
Service Account:  redacted-ingress-nginx
Node:             gke-redacted-redacted-node-pool-1c9c7ca5-zjnl/10.196.0.33
Start Time:       Thu, 13 Jul 2023 18:57:52 +0100
Labels:           app.kubernetes.io/component=controller
              app.kubernetes.io/instance=redacted
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.8.1
              helm.sh/chart=ingress-nginx-4.7.1
              pod-template-hash=6dc7cbddcd
Annotations:      <none>
Status:           Running
IP:               redac.ted.2.9
IPs:
IP:           redac.ted.2.9
Controlled By:  ReplicaSet/redacted-ingress-nginx-controller-6dc7cbddcd
Containers:
controller:
Container ID:  containerd://e9ac930b602a1b7f7e922423269d48f43dd8690f71e939314f77b67b3cd36874
Image:         registry.k8s.io/ingress-nginx/controller:v1.8.1@sha256:e5c4824e7375fcf2a393e1c03c293b69759af37a9ca6abdb91b13d78a93da8bd
Image ID:      registry.k8s.io/ingress-nginx/controller@sha256:e5c4824e7375fcf2a393e1c03c293b69759af37a9ca6abdb91b13d78a93da8bd
Ports:         80/TCP, 443/TCP, 8443/TCP
Host Ports:    0/TCP, 0/TCP, 0/TCP
Args:
  /nginx-ingress-controller
  --publish-service=$(POD_NAMESPACE)/redacted-ingress-nginx-controller
  --election-id=redacted-ingress-nginx-leader
  --controller-class=k8s.io/ingress-nginx
  --ingress-class=nginx
  --configmap=$(POD_NAMESPACE)/redacted-ingress-nginx-controller
  --validating-webhook=:8443
  --validating-webhook-certificate=/usr/local/certificates/cert
  --validating-webhook-key=/usr/local/certificates/key
State:          Running
  Started:      Thu, 13 Jul 2023 19:18:44 +0100
Last State:     Terminated
  Reason:       Completed
  Exit Code:    0
  Started:      Thu, 13 Jul 2023 19:12:01 +0100
  Finished:     Thu, 13 Jul 2023 19:13:29 +0100
Ready:          True
Restart Count:  8
Requests:
  cpu:      100m
  memory:   90Mi
Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
  POD_NAME:       redacted-ingress-nginx-controller-6dc7cbddcd-5pwrt (v1:metadata.name)
  POD_NAMESPACE:  default (v1:metadata.namespace)
  LD_PRELOAD:     /usr/local/lib/libmimalloc.so
Mounts:
  /usr/local/certificates/ from webhook-cert (ro)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-67967 (ro)
Conditions:
Type              Status
Initialized       True
Ready             True
ContainersReady   True
PodScheduled      True
Volumes:
webhook-cert:
Type:        Secret (a volume populated by a Secret)
SecretName:  redacted-ingress-nginx-admission
Optional:    false
kube-api-access-67967:
Type:                    Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds:  3607
ConfigMapName:           kube-root-ca.crt
ConfigMapOptional:       <nil>
DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                         node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>

Name:                     redacted-ingress-nginx-controller
Namespace:                default
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=redacted
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/part-of=ingress-nginx
                          app.kubernetes.io/version=1.8.1
                          helm.sh/chart=ingress-nginx-4.7.1
Annotations:              cloud.google.com/neg: {"ingress":true}
                          meta.helm.sh/release-name: redacted
                          meta.helm.sh/release-namespace: default
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=redacted,app.kubernetes.io/name=ingress-nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       redac.ted11.83
IPs:                      redac.ted11.83
IP:                       re.da.ct.ed
LoadBalancer Ingress:     re.da.ct.ed
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  31852/TCP
Endpoints:                reda.cted.2.9:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  30957/TCP
Endpoints:                reda.cted.2.9:443
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason               Age                  From                Message
  ----    ------               ----                 ----                -------
  Normal  UpdatedLoadBalancer  3m3s (x53 over 20d)  service-controller  Updated load balancer with new hosts

Others:
- Any other related information like ;
- copy/paste of the snippet (if applicable)
- kubectl describe ... of any custom configmap(s) created and in use
- Any other related information that may help

How to reproduce this issue:

I'm not sure, has this seems to app out of the blue, and then it simply starts working, here is the log of the controller pod (with some private infor redacted, but all logs there)

To be honest what I would really like to know is if there is a way to get logs to show the actually cause of "Unexpected failure reloading the backend" if that's even possible.. like more verbose logging or such?

nginx-controller.log

k8s-ci-robot commented 1 year ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

longwuyuan commented 1 year ago

hi, Can you add detailed description of the issue

longwuyuan commented 1 year ago

/remove-kind bug

mike-pt commented 1 year ago

Sorry if that was not very detailed, I did provide a full log no sure if you noticed it, as for the issue it self I don't really have much more info that the title it self.

the controller stops with "Unexpected failure reloading the backend" until eventually it just works, we also notice that sometimes it also prints "Dynamic reconfiguration failed".

This is why I posted the full logs, because its not clear why this fails but maybe to someone who has deeper understanding of the controller the logs provide some indication... I also found no way to make this more verbose, would be nice if such an option exists.

github-actions[bot] commented 1 year ago

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

longwuyuan commented 3 weeks ago

Sorry for no action here.

The controlller received a SIGTERM ;

"Received SIGTERM, shutting down"

There are no events or log messages available to guess the events prior to that so the assumption is a environment factor ranging from resource starvation to a security related violation ending in SIGTERM

Unless there is a easy reproduce method that we can copy/paste from on a kind cluster, there is no action we can talke.

Because we can not take any action and this issue has to be closed as it does not track any action item but adds to the tally of open issues. If this is still happening or you need to report any problem, please test on recent release of the controller and provide info that is asked in the template of a new bug report, by editing the issue description here.

/kind suppport /close

k8s-ci-robot commented 3 weeks ago

@longwuyuan: The label(s) kind/suppport cannot be applied, because the repository doesn't have them.

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/10204#issuecomment-2348657445): >Sorry for no action here. > >The controlller received a SIGTERM ; >``` >"Received SIGTERM, shutting down" >``` > >There are no events or log messages available to guess the events prior to that so the assumption is a environment factor ranging from resource starvation to a security related violation ending in SIGTERM > >Unless there is a easy reproduce method that we can copy/paste from on a kind cluster, there is no action we can talke. > >Because we can not take any action and this issue has to be closed as it does not track any action item but adds to the tally of open issues. If this is still happening or you need to report any problem, please test on recent release of the controller and provide info that is asked in the template of a new bug report, by editing the issue description here. > >/kind suppport >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

k8s-ci-robot commented 3 weeks ago

@longwuyuan: Closing this issue.

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/10204#issuecomment-2348657445): >Sorry for no action here. > >The controlller received a SIGTERM ; >``` >"Received SIGTERM, shutting down" >``` > >There are no events or log messages available to guess the events prior to that so the assumption is a environment factor ranging from resource starvation to a security related violation ending in SIGTERM > >Unless there is a easy reproduce method that we can copy/paste from on a kind cluster, there is no action we can talke. > >Because we can not take any action and this issue has to be closed as it does not track any action item but adds to the tally of open issues. If this is still happening or you need to report any problem, please test on recent release of the controller and provide info that is asked in the template of a new bug report, by editing the issue description here. > >/kind suppport >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

kubernetes / ingress-nginx

Unexpected failure reloading the backend & Dynamic reconfiguration failed #10204