Ingress Nginx Ingress Controller fails to start on EKS Node

PhilipBehrenberg commented 2 years ago

What happened:

Ingress Controller pods error on startup and enter CrashLoopBackOff. This system is running on EKS on customized versions of the official AWS EKS nodes.

Ingress Controller Pod Logs

``` W1028 20:10:58.192748 6 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I1028 20:10:58.192851 6 main.go:209] "Creating API client" host="https://172.20.0.1:443" ------------------------------------------------------------------------------- NGINX Ingress controller Release: v1.4.0 Build: 50be2bf95fd1ef480420e2aa1d6c5c7c138c95ea Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.19.10 ------------------------------------------------------------------------------- I1028 20:10:58.216752 6 main.go:253] "Running in Kubernetes cluster" major="1" minor="23+" git="v1.23.10-eks-15b7512" state="clean" commit="cd6399691d9b1fed9ec20c9c5e82f5993c3f42cb" platform="linux/amd64" I1028 20:10:58.435662 6 main.go:104] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem" W1028 20:10:58.459181 6 nginx.go:83] Error reading system nameservers: open /etc/resolv.conf: permission denied I1028 20:10:58.460332 6 ssl.go:533] "loading tls certificate" path="/usr/local/certificates/cert" key="/usr/local/certificates/key" I1028 20:10:58.482546 6 nginx.go:260] "Starting NGINX Ingress controller" I1028 20:10:58.497973 6 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"ingress-nginx-controller", UID:"6e89c60a-a27d-4d88-af8c-eec4c5b583f9", APIVersion:"v1", ResourceVersion:"221955", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/ingress-nginx-controller I1028 20:10:58.498017 6 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"ingress-nginx-tcp", UID:"d00ca974-f5b1-4ac4-a69b-d20a9d615dbe", APIVersion:"v1", ResourceVersion:"221956", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/ingress-nginx-tcp I1028 20:10:59.684612 6 nginx.go:303] "Starting NGINX process" I1028 20:10:59.686089 6 leaderelection.go:248] attempting to acquire leader lease ingress-nginx/ingress-controller-leader... I1028 20:10:59.688208 6 nginx.go:323] "Starting validation webhook" address=":8443" certPath="/usr/local/certificates/cert" keyPath="/usr/local/certificates/key" W1028 20:10:59.691719 6 controller.go:424] Error getting Service "default/connection-manager": no object matching key "default/connection-manager" in local store I1028 20:10:59.692016 6 controller.go:168] "Configuration changes detected, backend reload required" I1028 20:10:59.703401 6 leaderelection.go:258] successfully acquired lease ingress-nginx/ingress-controller-leader I1028 20:10:59.703539 6 status.go:84] "New leader elected" identity="ingress-nginx-controller-58cmx" I1028 20:10:59.762241 6 controller.go:185] "Backend successfully reloaded" I1028 20:10:59.762473 6 controller.go:196] "Initial sync, sleeping for 1 second" I1028 20:10:59.762799 6 event.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-58cmx", UID:"fd0cbd96-d926-402a-a572-3e6a6761419d", APIVersion:"v1", ResourceVersion:"413390", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration 2022/10/28 20:10:59 [error] 25#25: init_by_lua error: init_by_lua:9: require failed: /etc/nginx/lua/util/resolv_conf.lua:70: could not open /etc/resolv.conf: /etc/resolv.conf: Permission denied stack traceback: [C]: in function 'error' init_by_lua:9: in main chunk W1028 20:11:00.763534 6 controller.go:216] Dynamic reconfiguration failed (retrying; 15 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:01.778427 6 controller.go:216] Dynamic reconfiguration failed (retrying; 14 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:03.203239 6 controller.go:216] Dynamic reconfiguration failed (retrying; 13 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:05.011439 6 controller.go:216] Dynamic reconfiguration failed (retrying; 12 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:07.240442 6 controller.go:216] Dynamic reconfiguration failed (retrying; 11 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:10.355045 6 controller.go:216] Dynamic reconfiguration failed (retrying; 10 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:14.279282 6 controller.go:216] Dynamic reconfiguration failed (retrying; 9 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:19.265449 6 controller.go:216] Dynamic reconfiguration failed (retrying; 8 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:26.012199 6 controller.go:216] Dynamic reconfiguration failed (retrying; 7 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:34.984880 6 controller.go:216] Dynamic reconfiguration failed (retrying; 6 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:45.920391 6 controller.go:216] Dynamic reconfiguration failed (retrying; 5 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused I1028 20:11:56.847909 6 sigterm.go:36] "Received SIGTERM, shutting down" I1028 20:11:56.847935 6 nginx.go:379] "Shutting down controller queues" I1028 20:11:56.872811 6 nginx.go:387] "Stopping admission controller" E1028 20:11:56.872857 6 nginx.go:326] "Error listening for TLS connections" err="http: Server closed" I1028 20:11:56.872864 6 nginx.go:395] "Stopping NGINX process" 2022/10/28 20:11:56 [warn] 41#41: the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:150 nginx: [warn] the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:150 2022/10/28 20:11:56 [warn] 41#41: the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:151 nginx: [warn] the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:151 2022/10/28 20:11:56 [warn] 41#41: the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /etc/nginx/nginx.conf:152 nginx: [warn] the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /etc/nginx/nginx.conf:152 2022/10/28 20:11:56 [notice] 41#41: signal process started I1028 20:11:57.902547 6 nginx.go:408] "NGINX process has stopped" I1028 20:11:57.902570 6 sigterm.go:44] Handled quit, delaying controller exit for 10 seconds E1028 20:11:59.703761 6 queue.go:78] "queue has been shutdown, failed to enqueue" key="&ObjectMeta{Name:sync status,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[]OwnerReference{},Finalizers:[],ManagedFields:[]ManagedFieldsEntry{},}" W1028 20:12:00.322104 6 controller.go:216] Dynamic reconfiguration failed (retrying; 4 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused I1028 20:12:07.903088 6 sigterm.go:47] "Exiting" code=0 ```

What you expected to happen:

The ingress controller pods should start up and enable incoming connections from the NLB that was created by the inrgess-nginx.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

NGINX Ingress controller
  Release:       v1.4.0
  Build:         50be2bf95fd1ef480420e2aa1d6c5c7c138c95ea
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.19.10

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"23+", GitVersion:"v1.23.7-eks-4721010", GitCommit:"b77d9473a02fbfa834afa67d677fd12d690b195f", GitTreeState:"clean", BuildDate:"2022-06-27T22:22:16Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23+", GitVersion:"v1.23.10-eks-15b7512", GitCommit:"cd6399691d9b1fed9ec20c9c5e82f5993c3f42cb", GitTreeState:"clean", BuildDate:"2022-08-31T19:17:01Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: AWS EKS customized nodes built on top of official EKS nodes
OS (e.g. from /etc/os-release): Amazon Linux 2
Kernel (e.g. uname -a): Linux 5.4.209-116.367.amzn2.x86_64 #1 SMP Wed Aug 31 00:09:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Install tools:
- The nodes are run with kubelet, via the build-in EKS bootstrap script.
- The nodes are created using the EC2 image builder. The base image is the most recent EKS-ready image.
- Additional recipes:
- stig-build-linux-high
- Custom recipe that installs several additional monitoring systems
Basic cluster related info:
- Client Version: v1.23.7-eks-4721010
- Server Version: v1.23.10-eks-15b7512

NAME                                              STATUS   ROLES    AGE   VERSION               INTERNAL-IP   EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-3-198.[region removed].compute.internal   Ready    <none>   39h   v1.23.9-eks-ba74326   10.0.3.198    <none>        Amazon Linux 2   5.4.209-116.367.amzn2.x86_64   docker://20.10.17
ip-10-0-5-135.[region removed].compute.internal   Ready    <none>   39h   v1.23.9-eks-ba74326   10.0.5.135    <none>        Amazon Linux 2   5.4.209-116.367.amzn2.x86_64   docker://20.10.17

How was the ingress-nginx-controller installed:
- ingress-nginx ingress-nginx 1 2022-10-27 23:40:51.1602531 +0000 UTC deployed ingress-nginx-4.3.0 1.4.0

Ingress Controller Values File

``` controller: config: allow-snippet-annotations: "true" http-snippet: | server { listen 2443; return 308 https://$host$request_uri; } proxy-real-ip-cidr: 10.0.0.0/16 use-forwarded-headers: "true" kind: DaemonSet containerPort: http: 80 https: 80 tohttps: 2443 service: targetPorts: http: tohttps https: http annotations: service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-xxxxxxxxxxxxxxxxx, subnet-xxxxxxxxxxxxxxxxx, subnet-xxxxxxxxxxxxxxxxx service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws-us-gov:acm:[region-removed]:xxxxxxxxxxxx:certificate/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http service.beta.kubernetes.io/aws-load-balancer-ssl-ports: https service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "3600" service.beta.kubernetes.io/aws-load-balancer-type: nlb tcp: "32443": default/app:9443 ```

Current State of the controller:

Describe IngresClass

``` Name: nginx Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.4.0 helm.sh/chart=ingress-nginx-4.3.0 Annotations: meta.helm.sh/release-name: ingress-nginx meta.helm.sh/release-namespace: ingress-nginx Controller: k8s.io/ingress-nginx Events: ```

Describe All

``` Name: ingress-nginx-controller-58cmx Namespace: ingress-nginx Priority: 0 Node: ip-10-0-5-135.[region removed].compute.internal/10.0.5.135 Start Time: Thu, 27 Oct 2022 23:40:56 +0000 Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/name=ingress-nginx controller-revision-hash=6ffbc58587 pod-template-generation=1 Annotations: kubernetes.io/psp: eks.privileged Status: Running IP: 10.0.5.85 IPs: IP: 10.0.5.85 Controlled By: DaemonSet/ingress-nginx-controller Containers: controller: Container ID: docker://63fa2d76ba1f47d4ea09c9a393bb33a09dc28ea43e7a8d49af94c7bc19f89bad Image: registry.k8s.io/ingress-nginx/controller:v1.4.0@sha256:34ee929b111ffc7aa426ffd409af44da48e5a0eea1eb2207994d9e0c0882d143 Image ID: docker-pullable://registry.k8s.io/ingress-nginx/controller@sha256:34ee929b111ffc7aa426ffd409af44da48e5a0eea1eb2207994d9e0c0882d143 Ports: 80/TCP, 80/TCP, 2443/TCP, 8443/TCP, 32443/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP Args: /nginx-ingress-controller --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller --election-id=ingress-controller-leader --controller-class=k8s.io/ingress-nginx --ingress-class=nginx --configmap=$(POD_NAMESPACE)/ingress-nginx-controller --tcp-services-configmap=$(POD_NAMESPACE)/ingress-nginx-tcp --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating-webhook-key=/usr/local/certificates/key State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Completed Exit Code: 0 Started: Fri, 28 Oct 2022 20:55:18 +0000 Finished: Fri, 28 Oct 2022 20:56:27 +0000 Ready: False Restart Count: 349 Requests: cpu: 100m memory: 90Mi Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5 Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: POD_NAME: ingress-nginx-controller-58cmx (v1:metadata.name) POD_NAMESPACE: ingress-nginx (v1:metadata.namespace) LD_PRELOAD: /usr/local/lib/libmimalloc.so Mounts: /usr/local/certificates/ from webhook-cert (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g6hpb (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: webhook-cert: Type: Secret (a volume populated by a Secret) SecretName: ingress-nginx-admission Optional: false kube-api-access-g6hpb: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal RELOAD 56m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 54m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 48m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal Pulled 47m (x338 over 21h) kubelet Container image "registry.k8s.io/ingress-nginx/controller:v1.4.0@sha256:34ee929b111ffc7aa426ffd409af44da48e5a0eea1eb2207994d9e0c0882d143" already present on machine Normal RELOAD 47m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 41m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 40m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 33m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 32m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 26m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 25m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 18m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 17m nginx-ingress-controller NGINX reload triggered due to a change in configuration Warning Unhealthy 17m (x2417 over 21h) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500 Normal RELOAD 11m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 10m nginx-ingress-controller NGINX reload triggered due to a change in configuration Warning BackOff 7m20s (x4235 over 21h) kubelet Back-off restarting failed container Normal RELOAD 4m10s nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 3m6s nginx-ingress-controller NGINX reload triggered due to a change in configuration Warning Unhealthy 2m19s (x1749 over 21h) kubelet Liveness probe failed: HTTP probe failed with statuscode: 500 Name: ingress-nginx-controller-vm52x Namespace: ingress-nginx Priority: 0 Node: ip-10-0-3-198.[region removed].compute.internal/10.0.3.198 Start Time: Thu, 27 Oct 2022 23:40:56 +0000 Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/name=ingress-nginx controller-revision-hash=6ffbc58587 pod-template-generation=1 Annotations: kubernetes.io/psp: eks.privileged Status: Running IP: 10.0.3.119 IPs: IP: 10.0.3.119 Controlled By: DaemonSet/ingress-nginx-controller Containers: controller: Container ID: docker://1d9ced8e9c230d14cb4dd915f4d7a134006dd75dd169ef490e929ea30a4c3ae1 Image: registry.k8s.io/ingress-nginx/controller:v1.4.0@sha256:34ee929b111ffc7aa426ffd409af44da48e5a0eea1eb2207994d9e0c0882d143 Image ID: docker-pullable://registry.k8s.io/ingress-nginx/controller@sha256:34ee929b111ffc7aa426ffd409af44da48e5a0eea1eb2207994d9e0c0882d143 Ports: 80/TCP, 80/TCP, 2443/TCP, 8443/TCP, 32443/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP Args: /nginx-ingress-controller --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller --election-id=ingress-controller-leader --controller-class=k8s.io/ingress-nginx --ingress-class=nginx --configmap=$(POD_NAMESPACE)/ingress-nginx-controller --tcp-services-configmap=$(POD_NAMESPACE)/ingress-nginx-tcp --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating-webhook-key=/usr/local/certificates/key State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Completed Exit Code: 0 Started: Fri, 28 Oct 2022 20:55:48 +0000 Finished: Fri, 28 Oct 2022 20:56:57 +0000 Ready: False Restart Count: 350 Requests: cpu: 100m memory: 90Mi Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5 Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: POD_NAME: ingress-nginx-controller-vm52x (v1:metadata.name) POD_NAMESPACE: ingress-nginx (v1:metadata.namespace) LD_PRELOAD: /usr/local/lib/libmimalloc.so Mounts: /usr/local/certificates/ from webhook-cert (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gxqdn (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: webhook-cert: Type: Secret (a volume populated by a Secret) SecretName: ingress-nginx-admission Optional: false kube-api-access-gxqdn: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal RELOAD 55m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 54m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 48m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 46m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 40m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 39m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 33m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 32m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 25m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 24m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 18m nginx-ingress-controller NGINX reload triggered due to a change in configuration Warning Unhealthy 17m (x2425 over 21h) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500 Normal RELOAD 17m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 10m nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 9m56s nginx-ingress-controller NGINX reload triggered due to a change in configuration Warning BackOff 7m26s (x4227 over 21h) kubelet Back-off restarting failed container Normal RELOAD 3m39s nginx-ingress-controller NGINX reload triggered due to a change in configuration Normal RELOAD 2m36s nginx-ingress-controller NGINX reload triggered due to a change in configuration Warning Unhealthy 2m19s (x1751 over 21h) kubelet Liveness probe failed: HTTP probe failed with statuscode: 500 Name: ingress-nginx-controller Namespace: ingress-nginx Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.4.0 helm.sh/chart=ingress-nginx-4.3.0 Annotations: meta.helm.sh/release-name: ingress-nginx meta.helm.sh/release-namespace: ingress-nginx service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: 3600 service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws-us-gov:acm:[region removed]:xxxxxxxxxxxx:certificate/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx service.beta.kubernetes.io/aws-load-balancer-ssl-ports: https service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-xxxxxxxxxxxxxxxxx, subnet-xxxxxxxxxxxxxxxxx, subnet-xxxxxxxxxxxxxxxxx service.beta.kubernetes.io/aws-load-balancer-type: nlb Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.20.157.170 IPs: 172.20.157.170 LoadBalancer Ingress: a3ebaf5eb07b24c08b1118a33ca6ebfd-1a8a5b6538b5f2a3.elb.[region removed].amazonaws.com Port: http 80/TCP TargetPort: tohttps/TCP NodePort: http 32057/TCP Endpoints: Port: https 443/TCP TargetPort: http/TCP NodePort: https 30263/TCP Endpoints: Port: 32443-tcp 32443/TCP TargetPort: 32443-tcp/TCP NodePort: 32443-tcp 30514/TCP Endpoints: Session Affinity: None External Traffic Policy: Cluster Events: Name: ingress-nginx-controller-admission Namespace: ingress-nginx Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.4.0 helm.sh/chart=ingress-nginx-4.3.0 Annotations: meta.helm.sh/release-name: ingress-nginx meta.helm.sh/release-namespace: ingress-nginx Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx Type: ClusterIP IP Family Policy: SingleStack IP Families: IPv4 IP: 172.20.52.146 IPs: 172.20.52.146 Port: https-webhook 443/TCP TargetPort: webhook/TCP Endpoints: Session Affinity: None Events: Name: ingress-nginx-controller Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx Node-Selector: kubernetes.io/os=linux Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.4.0 helm.sh/chart=ingress-nginx-4.3.0 Annotations: deprecated.daemonset.template.generation: 1 meta.helm.sh/release-name: ingress-nginx meta.helm.sh/release-namespace: ingress-nginx Desired Number of Nodes Scheduled: 2 Current Number of Nodes Scheduled: 2 Number of Nodes Scheduled with Up-to-date Pods: 2 Number of Nodes Scheduled with Available Pods: 0 Number of Nodes Misscheduled: 0 Pods Status: 2 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/name=ingress-nginx Service Account: ingress-nginx Containers: controller: Image: registry.k8s.io/ingress-nginx/controller:v1.4.0@sha256:34ee929b111ffc7aa426ffd409af44da48e5a0eea1eb2207994d9e0c0882d143 Ports: 80/TCP, 80/TCP, 2443/TCP, 8443/TCP, 32443/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP Args: /nginx-ingress-controller --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller --election-id=ingress-controller-leader --controller-class=k8s.io/ingress-nginx --ingress-class=nginx --configmap=$(POD_NAMESPACE)/ingress-nginx-controller --tcp-services-configmap=$(POD_NAMESPACE)/ingress-nginx-tcp --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating-webhook-key=/usr/local/certificates/key Requests: cpu: 100m memory: 90Mi Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5 Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: POD_NAME: (v1:metadata.name) POD_NAMESPACE: (v1:metadata.namespace) LD_PRELOAD: /usr/local/lib/libmimalloc.so Mounts: /usr/local/certificates/ from webhook-cert (ro) Volumes: webhook-cert: Type: Secret (a volume populated by a Secret) SecretName: ingress-nginx-admission Optional: false Events: ```

Describe Svc

``` Name: ingress-nginx-controller Namespace: ingress-nginx Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.4.0 helm.sh/chart=ingress-nginx-4.3.0 Annotations: meta.helm.sh/release-name: ingress-nginx meta.helm.sh/release-namespace: ingress-nginx service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: 3600 service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws-us-gov:acm:[region removed]:xxxxxxxxxxxx:certificate/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx service.beta.kubernetes.io/aws-load-balancer-ssl-ports: https service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-xxxxxxxxxxxxxxxxx, subnet-xxxxxxxxxxxxxxxxx, subnet-xxxxxxxxxxxxxxxxx service.beta.kubernetes.io/aws-load-balancer-type: nlb Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.20.157.170 IPs: 172.20.157.170 LoadBalancer Ingress: a3ebaf5eb07b24c08b1118a33ca6ebfd-1a8a5b6538b5f2a3.elb.[region removed].amazonaws.com Port: http 80/TCP TargetPort: tohttps/TCP NodePort: http 32057/TCP Endpoints: Port: https 443/TCP TargetPort: http/TCP NodePort: https 30263/TCP Endpoints: Port: 32443-tcp 32443/TCP TargetPort: 32443-tcp/TCP NodePort: 32443-tcp 30514/TCP Endpoints: Session Affinity: None External Traffic Policy: Cluster Events: Name: ingress-nginx-controller-admission Namespace: ingress-nginx Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.4.0 helm.sh/chart=ingress-nginx-4.3.0 Annotations: meta.helm.sh/release-name: ingress-nginx meta.helm.sh/release-namespace: ingress-nginx Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx Type: ClusterIP IP Family Policy: SingleStack IP Families: IPv4 IP: 172.20.52.146 IPs: 172.20.52.146 Port: https-webhook 443/TCP TargetPort: webhook/TCP Endpoints: Session Affinity: None Events: ```

How to reproduce this issue:

This is only happening in 1 of 3 environments that, as far as I can tell are the same. The environment itself is somehow causing this issue. This happens with every helm install I've done in this environment.

Additional Information:

The chart is successfully creating all of the NLB pieces, the only piece that's failing is the IC pods. The permissions on the /etc/resolv.conf are different in this pod then they are in the pod in the other environements this has worked correctly in:

-rw-r-----    1 root     root           141 Oct 28 21:28 /etc/resolv.conf

k8s-ci-robot commented 2 years ago

@PhilipBehrenberg: This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

longwuyuan commented 2 years ago

Does this work in its default install https://kubernetes.github.io/ingress-nginx/deploy/#aws
Do you have the required ports open and allowing connections

/remove-kind bug /kind support

PhilipBehrenberg commented 2 years ago

Does this work in its default install https://kubernetes.github.io/ingress-nginx/deploy/#aws

The default install has the exact same issue. By default install I assume you mean the one that is to just run the following command:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.4.0/deploy/static/provider/aws/deploy.yaml

Do you have the required ports open and allowing connections

I have a bunch of other pods running without any issue. They are able to communicate with each other, even across separate nodes in different AZ. The DNS is also working as expected.

longwuyuan commented 2 years ago

Then please post the error messages and other logs. Thanks.

PhilipBehrenberg commented 2 years ago

All of my logs and everything are in the original post. Running the default install failed in the exact same way, same error everything.

strongjz commented 2 years ago

/assign @strongjz

sorind-broadsign commented 1 year ago

I have the same issue. Any idea on what could cause this ?

For me, it starts failing on appVersion: 1.2.1 and chart version: 4.1.3

bcessa commented 1 year ago

In case this is helpful to someone else. I was having this exact issue with DOKS. In my case the reason was using hostNetwork: true.

This was causing the health checks to fail due to a missing node address. Specifically, the value of controller.healthCheckHost:

Address to bind the health check endpoint. It is better to set this option to the internal node address if the ingress nginx controller is running in the hostNetwork: true mode.

Simply turning it off with hostNetwork: false (the default value) solved the issue for me.

app version = 1.5.1
chart = ingress-nginx-4.4.0

longwuyuan commented 2 months ago

This is 2 years old and lots of users are currently using the controller on EKS without this failure. The error message posted was related to networking and specifically port 10246. That is a unfamiliar port number.

In any case the version of the controller reported is not supported anymore and AWS now requires the install of the AWS LoadBalancer Controller along with AWS specific annotations set during the install.

This issue is adding to the open issues count without a action item so I will close it for now. Please use the latest release of the controller as documented in the Deployment docs and ensure to open required ports and match the standard OS config as required by K8S. Then post all the info asked in the issue description by editing out th eold info and pasting the new test info.Then reopen the issue if you are still tracking this. Else it can remain closed. It will help us reduce the count of real issues being tracked with action items. thnaks

/close

k8s-ci-robot commented 2 months ago

@longwuyuan: Closing this issue.

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/9230#issuecomment-2336614541): >This is 2 years old and lots of users are currently using the controller on EKS without this failure. >The error message posted was related to networking and specifically port 10246. That is a unfamiliar port number. > >In any case the version of the controller reported is not supported anymore and AWS now requires the install of the AWS LoadBalancer Controller along with AWS specific annotations set during the install. > >This issue is adding to the open issues count without a action item so I will close it for now. Please use the latest release of the controller as documented in the Deployment docs and ensure to open required ports and match the standard OS config as required by K8S. Then post all the info asked in the issue description by editing out th eold info and pasting the new test info.Then reopen the issue if you are still tracking this. Else it can remain closed. It will help us reduce the count of real issues being tracked with action items. thnaks > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

kubernetes / ingress-nginx

Ingress Nginx Ingress Controller fails to start on EKS Node #9230