Closed PhilipBehrenberg closed 2 months ago
@PhilipBehrenberg: This issue is currently awaiting triage.
If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
/remove-kind bug /kind support
- Does this work in its default install https://kubernetes.github.io/ingress-nginx/deploy/#aws
The default install has the exact same issue. By default install I assume you mean the one that is to just run the following command:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.4.0/deploy/static/provider/aws/deploy.yaml
- Do you have the required ports open and allowing connections
I have a bunch of other pods running without any issue. They are able to communicate with each other, even across separate nodes in different AZ. The DNS is also working as expected.
Then please post the error messages and other logs. Thanks.
All of my logs and everything are in the original post. Running the default install failed in the exact same way, same error everything.
/assign @strongjz
I have the same issue. Any idea on what could cause this ?
For me, it starts failing on appVersion: 1.2.1
and chart version: 4.1.3
In case this is helpful to someone else. I was having this exact issue with DOKS. In my case the reason was using hostNetwork: true
.
This was causing the health checks to fail due to a missing node address. Specifically, the value of controller.healthCheckHost
:
Address to bind the health check endpoint. It is better to set this option to the internal node address if the ingress nginx controller is running in the hostNetwork: true mode.
Simply turning it off with hostNetwork: false
(the default value) solved the issue for me.
app version = 1.5.1
chart = ingress-nginx-4.4.0
This is 2 years old and lots of users are currently using the controller on EKS without this failure. The error message posted was related to networking and specifically port 10246. That is a unfamiliar port number.
In any case the version of the controller reported is not supported anymore and AWS now requires the install of the AWS LoadBalancer Controller along with AWS specific annotations set during the install.
This issue is adding to the open issues count without a action item so I will close it for now. Please use the latest release of the controller as documented in the Deployment docs and ensure to open required ports and match the standard OS config as required by K8S. Then post all the info asked in the issue description by editing out th eold info and pasting the new test info.Then reopen the issue if you are still tracking this. Else it can remain closed. It will help us reduce the count of real issues being tracked with action items. thnaks
/close
@longwuyuan: Closing this issue.
What happened:
Ingress Controller pods error on startup and enter CrashLoopBackOff. This system is running on EKS on customized versions of the official AWS EKS nodes.
Ingress Controller Pod Logs
``` W1028 20:10:58.192748 6 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I1028 20:10:58.192851 6 main.go:209] "Creating API client" host="https://172.20.0.1:443" ------------------------------------------------------------------------------- NGINX Ingress controller Release: v1.4.0 Build: 50be2bf95fd1ef480420e2aa1d6c5c7c138c95ea Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.19.10 ------------------------------------------------------------------------------- I1028 20:10:58.216752 6 main.go:253] "Running in Kubernetes cluster" major="1" minor="23+" git="v1.23.10-eks-15b7512" state="clean" commit="cd6399691d9b1fed9ec20c9c5e82f5993c3f42cb" platform="linux/amd64" I1028 20:10:58.435662 6 main.go:104] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem" W1028 20:10:58.459181 6 nginx.go:83] Error reading system nameservers: open /etc/resolv.conf: permission denied I1028 20:10:58.460332 6 ssl.go:533] "loading tls certificate" path="/usr/local/certificates/cert" key="/usr/local/certificates/key" I1028 20:10:58.482546 6 nginx.go:260] "Starting NGINX Ingress controller" I1028 20:10:58.497973 6 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"ingress-nginx-controller", UID:"6e89c60a-a27d-4d88-af8c-eec4c5b583f9", APIVersion:"v1", ResourceVersion:"221955", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/ingress-nginx-controller I1028 20:10:58.498017 6 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"ingress-nginx-tcp", UID:"d00ca974-f5b1-4ac4-a69b-d20a9d615dbe", APIVersion:"v1", ResourceVersion:"221956", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/ingress-nginx-tcp I1028 20:10:59.684612 6 nginx.go:303] "Starting NGINX process" I1028 20:10:59.686089 6 leaderelection.go:248] attempting to acquire leader lease ingress-nginx/ingress-controller-leader... I1028 20:10:59.688208 6 nginx.go:323] "Starting validation webhook" address=":8443" certPath="/usr/local/certificates/cert" keyPath="/usr/local/certificates/key" W1028 20:10:59.691719 6 controller.go:424] Error getting Service "default/connection-manager": no object matching key "default/connection-manager" in local store I1028 20:10:59.692016 6 controller.go:168] "Configuration changes detected, backend reload required" I1028 20:10:59.703401 6 leaderelection.go:258] successfully acquired lease ingress-nginx/ingress-controller-leader I1028 20:10:59.703539 6 status.go:84] "New leader elected" identity="ingress-nginx-controller-58cmx" I1028 20:10:59.762241 6 controller.go:185] "Backend successfully reloaded" I1028 20:10:59.762473 6 controller.go:196] "Initial sync, sleeping for 1 second" I1028 20:10:59.762799 6 event.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-58cmx", UID:"fd0cbd96-d926-402a-a572-3e6a6761419d", APIVersion:"v1", ResourceVersion:"413390", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration 2022/10/28 20:10:59 [error] 25#25: init_by_lua error: init_by_lua:9: require failed: /etc/nginx/lua/util/resolv_conf.lua:70: could not open /etc/resolv.conf: /etc/resolv.conf: Permission denied stack traceback: [C]: in function 'error' init_by_lua:9: in main chunk W1028 20:11:00.763534 6 controller.go:216] Dynamic reconfiguration failed (retrying; 15 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:01.778427 6 controller.go:216] Dynamic reconfiguration failed (retrying; 14 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:03.203239 6 controller.go:216] Dynamic reconfiguration failed (retrying; 13 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:05.011439 6 controller.go:216] Dynamic reconfiguration failed (retrying; 12 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:07.240442 6 controller.go:216] Dynamic reconfiguration failed (retrying; 11 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:10.355045 6 controller.go:216] Dynamic reconfiguration failed (retrying; 10 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:14.279282 6 controller.go:216] Dynamic reconfiguration failed (retrying; 9 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:19.265449 6 controller.go:216] Dynamic reconfiguration failed (retrying; 8 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:26.012199 6 controller.go:216] Dynamic reconfiguration failed (retrying; 7 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:34.984880 6 controller.go:216] Dynamic reconfiguration failed (retrying; 6 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused W1028 20:11:45.920391 6 controller.go:216] Dynamic reconfiguration failed (retrying; 5 retries left): Post "http://127.0.0.1:10246/configuration/backends": dial tcp 127.0.0.1:10246: connect: connection refused I1028 20:11:56.847909 6 sigterm.go:36] "Received SIGTERM, shutting down" I1028 20:11:56.847935 6 nginx.go:379] "Shutting down controller queues" I1028 20:11:56.872811 6 nginx.go:387] "Stopping admission controller" E1028 20:11:56.872857 6 nginx.go:326] "Error listening for TLS connections" err="http: Server closed" I1028 20:11:56.872864 6 nginx.go:395] "Stopping NGINX process" 2022/10/28 20:11:56 [warn] 41#41: the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:150 nginx: [warn] the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:150 2022/10/28 20:11:56 [warn] 41#41: the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:151 nginx: [warn] the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /etc/nginx/nginx.conf:151 2022/10/28 20:11:56 [warn] 41#41: the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /etc/nginx/nginx.conf:152 nginx: [warn] the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /etc/nginx/nginx.conf:152 2022/10/28 20:11:56 [notice] 41#41: signal process started I1028 20:11:57.902547 6 nginx.go:408] "NGINX process has stopped" I1028 20:11:57.902570 6 sigterm.go:44] Handled quit, delaying controller exit for 10 seconds E1028 20:11:59.703761 6 queue.go:78] "queue has been shutdown, failed to enqueue" key="&ObjectMeta{Name:sync status,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:What you expected to happen:
The ingress controller pods should start up and enable incoming connections from the NLB that was created by the inrgess-nginx.
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
Kubernetes version (use
kubectl version
):Environment:
uname -a
): Linux 5.4.209-116.367.amzn2.x86_64 #1 SMP Wed Aug 31 00:09:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linuxstig-build-linux-high
ingress-nginx ingress-nginx 1 2022-10-27 23:40:51.1602531 +0000 UTC deployed ingress-nginx-4.3.0 1.4.0
Ingress Controller Values File
``` controller: config: allow-snippet-annotations: "true" http-snippet: | server { listen 2443; return 308 https://$host$request_uri; } proxy-real-ip-cidr: 10.0.0.0/16 use-forwarded-headers: "true" kind: DaemonSet containerPort: http: 80 https: 80 tohttps: 2443 service: targetPorts: http: tohttps https: http annotations: service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-xxxxxxxxxxxxxxxxx, subnet-xxxxxxxxxxxxxxxxx, subnet-xxxxxxxxxxxxxxxxx service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws-us-gov:acm:[region-removed]:xxxxxxxxxxxx:certificate/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http service.beta.kubernetes.io/aws-load-balancer-ssl-ports: https service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "3600" service.beta.kubernetes.io/aws-load-balancer-type: nlb tcp: "32443": default/app:9443 ```Describe IngresClass
``` Name: nginx Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.4.0 helm.sh/chart=ingress-nginx-4.3.0 Annotations: meta.helm.sh/release-name: ingress-nginx meta.helm.sh/release-namespace: ingress-nginx Controller: k8s.io/ingress-nginx Events:Describe All
``` Name: ingress-nginx-controller-58cmx Namespace: ingress-nginx Priority: 0 Node: ip-10-0-5-135.[region removed].compute.internal/10.0.5.135 Start Time: Thu, 27 Oct 2022 23:40:56 +0000 Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/name=ingress-nginx controller-revision-hash=6ffbc58587 pod-template-generation=1 Annotations: kubernetes.io/psp: eks.privileged Status: Running IP: 10.0.5.85 IPs: IP: 10.0.5.85 Controlled By: DaemonSet/ingress-nginx-controller Containers: controller: Container ID: docker://63fa2d76ba1f47d4ea09c9a393bb33a09dc28ea43e7a8d49af94c7bc19f89bad Image: registry.k8s.io/ingress-nginx/controller:v1.4.0@sha256:34ee929b111ffc7aa426ffd409af44da48e5a0eea1eb2207994d9e0c0882d143 Image ID: docker-pullable://registry.k8s.io/ingress-nginx/controller@sha256:34ee929b111ffc7aa426ffd409af44da48e5a0eea1eb2207994d9e0c0882d143 Ports: 80/TCP, 80/TCP, 2443/TCP, 8443/TCP, 32443/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP Args: /nginx-ingress-controller --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller --election-id=ingress-controller-leader --controller-class=k8s.io/ingress-nginx --ingress-class=nginx --configmap=$(POD_NAMESPACE)/ingress-nginx-controller --tcp-services-configmap=$(POD_NAMESPACE)/ingress-nginx-tcp --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating-webhook-key=/usr/local/certificates/key State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Completed Exit Code: 0 Started: Fri, 28 Oct 2022 20:55:18 +0000 Finished: Fri, 28 Oct 2022 20:56:27 +0000 Ready: False Restart Count: 349 Requests: cpu: 100m memory: 90Mi Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5 Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: POD_NAME: ingress-nginx-controller-58cmx (v1:metadata.name) POD_NAMESPACE: ingress-nginx (v1:metadata.namespace) LD_PRELOAD: /usr/local/lib/libmimalloc.so Mounts: /usr/local/certificates/ from webhook-cert (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g6hpb (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: webhook-cert: Type: Secret (a volume populated by a Secret) SecretName: ingress-nginx-admission Optional: false kube-api-access-g6hpb: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional:Describe Svc
``` Name: ingress-nginx-controller Namespace: ingress-nginx Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.4.0 helm.sh/chart=ingress-nginx-4.3.0 Annotations: meta.helm.sh/release-name: ingress-nginx meta.helm.sh/release-namespace: ingress-nginx service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: 3600 service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws-us-gov:acm:[region removed]:xxxxxxxxxxxx:certificate/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx service.beta.kubernetes.io/aws-load-balancer-ssl-ports: https service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-xxxxxxxxxxxxxxxxx, subnet-xxxxxxxxxxxxxxxxx, subnet-xxxxxxxxxxxxxxxxx service.beta.kubernetes.io/aws-load-balancer-type: nlb Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.20.157.170 IPs: 172.20.157.170 LoadBalancer Ingress: a3ebaf5eb07b24c08b1118a33ca6ebfd-1a8a5b6538b5f2a3.elb.[region removed].amazonaws.com Port: http 80/TCP TargetPort: tohttps/TCP NodePort: http 32057/TCP Endpoints: Port: https 443/TCP TargetPort: http/TCP NodePort: https 30263/TCP Endpoints: Port: 32443-tcp 32443/TCP TargetPort: 32443-tcp/TCP NodePort: 32443-tcp 30514/TCP Endpoints: Session Affinity: None External Traffic Policy: Cluster Events:How to reproduce this issue:
This is only happening in 1 of 3 environments that, as far as I can tell are the same. The environment itself is somehow causing this issue. This happens with every helm install I've done in this environment.
Additional Information:
The chart is successfully creating all of the NLB pieces, the only piece that's failing is the IC pods. The permissions on the
/etc/resolv.conf
are different in this pod then they are in the pod in the other environements this has worked correctly in: