Closed markjgardner closed 1 year ago
Hi @markjgardner thanks for reporting!
Be sure to check out the docs while you wait for a human to take a look at this :slightly_smiling_face:
Cheers!
hi @markjgardner Version of the Ingress Controller == 4.4.2
looks like https://github.com/kubernetes/ingress-nginx/releases/tag/helm-chart-4.4.2 to me, latest release for this project is 3.0.1
Whoops, sorry you are correct @vepatel I was running 3.0.1 of nginx-ingress when I replicated the failure scenario. I just grabbed the version off the wrong ingress when reporting the issue. Apologies.
Can you please provide the pod logs, output from kubectl get pods -n <namespace>
and describe output of your daemonset/deployment?
Here you go @vepatel
$> k get po
NAME READY STATUS RESTARTS AGE
nginx-ingress-nginx-ingress-76d5956b79-p8gf2 1/1 Running 0 4m36s
simple-app-6d58c497f5-qg87w 1/1 Running 0 74m
simple-app-6d58c497f5-r5bm5 1/1 Running 0 74m
simple-app-6d58c497f5-ztbbk 0/1 Running 0 74m
$> k describe deploy simple-app
Name: simple-app
Namespace: default
CreationTimestamp: Wed, 08 Feb 2023 13:37:28 +0000
Labels: app=simple-app
Annotations: deployment.kubernetes.io/revision: 4
Selector: app=simple-app
Replicas: 3 desired | 3 updated | 3 total | 2 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=simple-app
Containers:
aspnet:
Image: mcr.microsoft.com/dotnet/samples:aspnetapp
Port: 80/TCP
Host Port: 0/TCP
Readiness: exec [cat /tmp/healthy] delay=3s timeout=1s period=3s #success=1 #failure=3
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available False MinimumReplicasUnavailable
OldReplicaSets: <none>
NewReplicaSet: simple-app-6d58c497f5 (3/3 replicas created)
Events: <none>
sorry meant you Ingress controller pod logs and Ingress Controller deployment describe output
@vepatel
$> k describe deploy nginx-ingress-nginx-ingress
Name: nginx-ingress-nginx-ingress
Namespace: default
CreationTimestamp: Wed, 08 Feb 2023 17:48:15 +0000
Labels: app.kubernetes.io/instance=nginx-ingress
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=nginx-ingress-nginx-ingress
helm.sh/chart=nginx-ingress-0.16.1
Annotations: deployment.kubernetes.io/revision: 1
meta.helm.sh/release-name: nginx-ingress
meta.helm.sh/release-namespace: default
Selector: app=nginx-ingress-nginx-ingress
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=nginx-ingress-nginx-ingress
Annotations: prometheus.io/port: 9113
prometheus.io/scheme: http
prometheus.io/scrape: true
Service Account: nginx-ingress-nginx-ingress
Containers:
nginx-ingress-nginx-ingress:
Image: nginx/nginx-ingress:3.0.1
Ports: 80/TCP, 443/TCP, 9113/TCP, 8081/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
Args:
-nginx-plus=false
-nginx-reload-timeout=60000
-enable-app-protect=false
-enable-app-protect-dos=false
-nginx-configmaps=$(POD_NAMESPACE)/nginx-ingress-nginx-ingress
-default-server-tls-secret=$(POD_NAMESPACE)/nginx-ingress-nginx-ingress-default-server-tls
-ingress-class=nginx
-health-status=false
-health-status-uri=/nginx-health
-nginx-debug=false
-v=1
-nginx-status=true
-nginx-status-port=8080
-nginx-status-allow-cidrs=127.0.0.1
-report-ingress-status
-external-service=nginx-ingress-nginx-ingress
-enable-leader-election=true
-leader-election-lock-name=nginx-ingress-nginx-ingress-leader-election
-enable-prometheus-metrics=true
-prometheus-metrics-listen-port=9113
-prometheus-tls-secret=
-enable-service-insight=false
-service-insight-listen-port=9114
-service-insight-tls-secret=
-enable-custom-resources=true
-enable-snippets=false
-include-year=false
-disable-ipv6=false
-enable-tls-passthrough=false
-enable-preview-policies=false
-enable-cert-manager=false
-enable-oidc=false
-enable-external-dns=false
-ready-status=true
-ready-status-port=8081
-enable-latency-metrics=false
Requests:
cpu: 100m
memory: 128Mi
Readiness: http-get http://:readiness-port/nginx-ready delay=0s timeout=1s period=1s #success=1 #failure=3
Environment:
POD_NAMESPACE: (v1:metadata.namespace)
POD_NAME: (v1:metadata.name)
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: nginx-ingress-nginx-ingress-76d5956b79 (1/1 replicas created)
Events: <none>
$> k logs nginx-ingress-nginx-ingress-76d5956b79-p8gf2
NGINX Ingress Controller Version=3.0.1 Commit=051928835aaa724a9718e22ce2f7acc8a323317f Date=2023-01-25T23:32:26Z DirtyState=false Arch=linux/amd64 Go=go1.19.5
I0208 17:48:19.694052 1 flags.go:294] Starting with flags: ["-nginx-plus=false" "-nginx-reload-timeout=60000" "-enable-app-protect=false" "-enable-app-protect-dos=false" "-nginx-configmaps=default/nginx-ingress-nginx-ingress" "-default-server-tls-secret=default/nginx-ingress-nginx-ingress-default-server-tls" "-ingress-class=nginx" "-health-status=false" "-health-status-uri=/nginx-health" "-nginx-debug=false" "-v=1" "-nginx-status=true" "-nginx-status-port=8080" "-nginx-status-allow-cidrs=127.0.0.1" "-report-ingress-status" "-external-service=nginx-ingress-nginx-ingress" "-enable-leader-election=true" "-leader-election-lock-name=nginx-ingress-nginx-ingress-leader-election" "-enable-prometheus-metrics=true" "-prometheus-metrics-listen-port=9113" "-prometheus-tls-secret=" "-enable-service-insight=false" "-service-insight-listen-port=9114" "-service-insight-tls-secret=" "-enable-custom-resources=true" "-enable-snippets=false" "-include-year=false" "-disable-ipv6=false" "-enable-tls-passthrough=false" "-enable-preview-policies=false" "-enable-cert-manager=false" "-enable-oidc=false" "-enable-external-dns=false" "-ready-status=true" "-ready-status-port=8081" "-enable-latency-metrics=false"]
I0208 17:48:19.731261 1 main.go:227] Kubernetes version: 1.25.5
I0208 17:48:19.742156 1 main.go:373] Using nginx version: nginx/1.23.3
2023/02/08 17:48:19 [notice] 17#17: using the "epoll" event method
2023/02/08 17:48:19 [notice] 17#17: nginx/1.23.3
2023/02/08 17:48:19 [notice] 17#17: built by gcc 10.2.1 20210110 (Debian 10.2.1-6)
2023/02/08 17:48:19 [notice] 17#17: OS: Linux 5.4.0-1101-azure
2023/02/08 17:48:19 [notice] 17#17: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2023/02/08 17:48:19 [notice] 17#17: start worker processes
2023/02/08 17:48:19 [notice] 17#17: start worker process 18
2023/02/08 17:48:19 [notice] 17#17: start worker process 19
2023/02/08 17:48:19 [notice] 17#17: start worker process 20
2023/02/08 17:48:19 [notice] 17#17: start worker process 21
I0208 17:48:19.769860 1 listener.go:54] Starting Prometheus listener on: :9113/metrics
I0208 17:48:19.771245 1 leaderelection.go:248] attempting to acquire leader lease default/nginx-ingress-nginx-ingress-leader-election...
I0208 17:48:19.828558 1 leaderelection.go:258] successfully acquired lease default/nginx-ingress-nginx-ingress-leader-election
I0208 17:48:20.272070 1 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"simpleapp-ingress", UID:"fdec2bf0-cdee-40a5-9272-0db88890b5fd", APIVersion:"networking.k8s.io/v1", ResourceVersion:"63733244", FieldPath:""}): type: 'Normal' reason: 'AddedOrUpdated' Configuration for default/simpleapp-ingress was added or updated
I0208 17:48:20.284291 1 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"simpleapp-ingress", UID:"fdec2bf0-cdee-40a5-9272-0db88890b5fd", APIVersion:"networking.k8s.io/v1", ResourceVersion:"63733244", FieldPath:""}): type: 'Normal' reason: 'AddedOrUpdated' Configuration for default/simpleapp-ingress was added or updated
I0208 17:48:20.293621 1 event.go:285] Event(v1.ObjectReference{Kind:"Secret", Namespace:"default", Name:"nginx-ingress-nginx-ingress-default-server-tls", UID:"fb6d1479-2057-4418-a061-343bc28370bf", APIVersion:"v1", ResourceVersion:"63761508", FieldPath:""}): type: 'Normal' reason: 'Updated' the special Secret default/nginx-ingress-nginx-ingress-default-server-tls was updated
2023/02/08 17:48:20 [notice] 17#17: signal 1 (SIGHUP) received from 26, reconfiguring
2023/02/08 17:48:20 [notice] 17#17: reconfiguring
2023/02/08 17:48:20 [notice] 17#17: using the "epoll" event method
2023/02/08 17:48:20 [notice] 17#17: start worker processes
2023/02/08 17:48:20 [notice] 17#17: start worker process 27
2023/02/08 17:48:20 [notice] 17#17: start worker process 28
2023/02/08 17:48:20 [notice] 17#17: start worker process 29
2023/02/08 17:48:20 [notice] 17#17: start worker process 30
2023/02/08 17:48:20 [notice] 18#18: gracefully shutting down
2023/02/08 17:48:20 [notice] 19#19: gracefully shutting down
2023/02/08 17:48:20 [notice] 18#18: exiting
2023/02/08 17:48:20 [notice] 19#19: exiting
2023/02/08 17:48:20 [notice] 19#19: exit
2023/02/08 17:48:20 [notice] 18#18: exit
2023/02/08 17:48:20 [notice] 20#20: gracefully shutting down
2023/02/08 17:48:20 [notice] 20#20: exiting
2023/02/08 17:48:20 [notice] 20#20: exit
2023/02/08 17:48:20 [notice] 21#21: gracefully shutting down
2023/02/08 17:48:20 [notice] 21#21: exiting
2023/02/08 17:48:20 [notice] 21#21: exit
I0208 17:48:20.436166 1 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"default", Name:"nginx-ingress-nginx-ingress", UID:"ab301236-a36d-438c-8204-db6cb8f8d03e", APIVersion:"v1", ResourceVersion:"63761510", FieldPath:""}): type: 'Normal' reason: 'Updated' Configuration from default/nginx-ingress-nginx-ingress was updated
I0208 17:48:20.436196 1 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"simpleapp-ingress", UID:"fdec2bf0-cdee-40a5-9272-0db88890b5fd", APIVersion:"networking.k8s.io/v1", ResourceVersion:"63761585", FieldPath:""}): type: 'Normal' reason: 'AddedOrUpdated' Configuration for default/simpleapp-ingress was added or updated
2023/02/08 17:48:20 [notice] 17#17: signal 17 (SIGCHLD) received from 18
2023/02/08 17:48:20 [notice] 17#17: worker process 18 exited with code 0
2023/02/08 17:48:20 [notice] 17#17: signal 29 (SIGIO) received
2023/02/08 17:48:20 [notice] 17#17: signal 17 (SIGCHLD) received from 19
2023/02/08 17:48:20 [notice] 17#17: worker process 19 exited with code 0
2023/02/08 17:48:20 [notice] 17#17: worker process 20 exited with code 0
2023/02/08 17:48:20 [notice] 17#17: worker process 21 exited with code 0
2023/02/08 17:48:20 [notice] 17#17: signal 29 (SIGIO) received
2023/02/08 17:48:20 [notice] 17#17: signal 17 (SIGCHLD) received from 20
2023/02/08 17:49:01 [notice] 17#17: signal 1 (SIGHUP) received from 32, reconfiguring
2023/02/08 17:49:01 [notice] 17#17: reconfiguring
2023/02/08 17:49:01 [notice] 17#17: using the "epoll" event method
2023/02/08 17:49:01 [notice] 17#17: start worker processes
2023/02/08 17:49:01 [notice] 17#17: start worker process 33
2023/02/08 17:49:01 [notice] 17#17: start worker process 34
2023/02/08 17:49:01 [notice] 17#17: start worker process 35
2023/02/08 17:49:01 [notice] 17#17: start worker process 36
2023/02/08 17:49:01 [notice] 27#27: gracefully shutting down
2023/02/08 17:49:01 [notice] 28#28: gracefully shutting down
2023/02/08 17:49:01 [notice] 30#30: gracefully shutting down
2023/02/08 17:49:01 [notice] 30#30: exiting
2023/02/08 17:49:01 [notice] 27#27: exiting
2023/02/08 17:49:01 [notice] 29#29: gracefully shutting down
2023/02/08 17:49:01 [notice] 29#29: exiting
2023/02/08 17:49:01 [notice] 30#30: exit
2023/02/08 17:49:01 [notice] 27#27: exit
2023/02/08 17:49:01 [notice] 29#29: exit
2023/02/08 17:49:01 [notice] 28#28: exiting
2023/02/08 17:49:01 [notice] 28#28: exit
I0208 17:49:01.490268 1 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"simpleapp-ingress", UID:"fdec2bf0-cdee-40a5-9272-0db88890b5fd", APIVersion:"networking.k8s.io/v1", ResourceVersion:"63761898", FieldPath:""}): type: 'Normal' reason: 'AddedOrUpdated' Configuration for default/simpleapp-ingress was added or updated
2023/02/08 17:49:01 [notice] 17#17: signal 17 (SIGCHLD) received from 30
2023/02/08 17:49:01 [notice] 17#17: worker process 27 exited with code 0
2023/02/08 17:49:01 [notice] 17#17: worker process 28 exited with code 0
2023/02/08 17:49:01 [notice] 17#17: worker process 30 exited with code 0
2023/02/08 17:49:01 [notice] 17#17: signal 29 (SIGIO) received
2023/02/08 17:49:01 [notice] 17#17: signal 17 (SIGCHLD) received from 28
2023/02/08 17:49:01 [notice] 17#17: signal 17 (SIGCHLD) received from 29
2023/02/08 17:49:01 [notice] 17#17: worker process 29 exited with code 0
2023/02/08 17:49:01 [notice] 17#17: signal 29 (SIGIO) received
166.198.116.168 - - [08/Feb/2023:17:50:43 +0000] "GET / HTTP/1.1" 200 3819 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78" "-"
2023/02/08 17:50:43 [warn] 33#33: *5 an upstream response is buffered to a temporary file /var/cache/nginx/proxy_temp/1/00/0000000001 while reading upstream, client: 166.198.116.168, server: 20.121.81.115.nip.io, request: "GET /lib/bootstrap/dist/css/bootstrap.min.css HTTP/1.1", upstream: "http://10.1.0.104:80/lib/bootstrap/dist/css/bootstrap.min.css", host: "20.121.81.115.nip.io", referrer: "http://20.121.81.115.nip.io/"
166.198.116.168 - - [08/Feb/2023:17:50:43 +0000] "GET /css/site.css?v=AKvNjO3dCPPS0eSU1Ez8T2wI280i08yGycV9ndytL-c HTTP/1.1" 200 194 "http://20.121.81.115.nip.io/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78" "-"
166.198.116.168 - - [08/Feb/2023:17:50:43 +0000] "GET /aspnetapp.styles.css?v=dmaWIJMtYHjABWevZ_2Q8P4v1xrVPOBMkiL86DlKmX8 HTTP/1.1" 200 1077 "http://20.121.81.115.nip.io/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78" "-"
166.198.116.168 - - [08/Feb/2023:17:50:43 +0000] "GET /js/site.js?v=4q1jwFhaPaZgr8WAUSrux6hAuh0XDg9kPS3xIVq36I0 HTTP/1.1" 200 230 "http://20.121.81.115.nip.io/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78" "-"
166.198.116.168 - - [08/Feb/2023:17:50:43 +0000] "GET /lib/bootstrap/dist/css/bootstrap.min.css HTTP/1.1" 200 162720 "http://20.121.81.115.nip.io/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78" "-"
166.198.116.168 - - [08/Feb/2023:17:50:43 +0000] "GET /lib/bootstrap/dist/js/bootstrap.bundle.min.js HTTP/1.1" 200 78468 "http://20.121.81.115.nip.io/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78" "-"
166.198.116.168 - - [08/Feb/2023:17:50:44 +0000] "GET /lib/jquery/dist/jquery.min.js HTTP/1.1" 200 89476 "http://20.121.81.115.nip.io/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78" "-"
166.198.116.168 - - [08/Feb/2023:17:50:44 +0000] "GET /favicon.ico HTTP/1.1" 200 5430 "http://20.121.81.115.nip.io/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78" "-"
166.198.116.168 - - [08/Feb/2023:17:52:28 +0000] "GET / HTTP/1.1" 200 3818 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78" "-"
166.198.116.168 - - [08/Feb/2023:17:52:29 +0000] "GET / HTTP/1.1" 200 3820 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78" "-"
166.198.116.168 - - [08/Feb/2023:17:52:30 +0000] "GET / HTTP/1.1" 200 3818 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78" "-"
166.198.116.168 - - [08/Feb/2023:17:52:31 +0000] "GET / HTTP/1.1" 200 3819 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78" "-"
166.198.116.168 - - [08/Feb/2023:17:52:31 +0000] "GET / HTTP/1.1" 200 3819 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78" "-"
166.198.116.168 - - [08/Feb/2023:17:52:32 +0000] "GET / HTTP/1.1" 200 3820 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78" "-"
166.198.116.168 - - [08/Feb/2023:17:52:32 +0000] "GET / HTTP/1.1" 200 3819 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.78" "-"
@vepatel I did some more forensics on this and it sure looks like nginx-ingress is bypassing the clusterIp service and proxying directly to the pod backends. This would explain why those pods aren't being removed when the readiness probe fails. My question is "why is it doing this?"
$> cat /etc/ngin/conf.d/default-simpleapp-ingress.conf
upstream default-simpleapp-ingress-<someip>.nip.io-simple-app-80 {
zone default-simpleapp-ingress-<someip>.nip.io-simple-app-80 256k;
random two least_conn;
server 10.1.0.26:80 max_fails=1 fail_timeout=10s max_conns=0;
server 10.1.0.10:80 max_fails=1 fail_timeout=10s max_conns=0;
server 10.1.0.31:80 max_fails=1 fail_timeout=10s max_conns=0;
}
server {
listen 80;
listen [::]:80;
server_tokens on;
server_name <someip>.nip.io;
set $resource_type "ingress";
set $resource_name "simpleapp-ingress";
set $resource_namespace "default";
location / {
set $service "simple-app";
proxy_http_version 1.1;
proxy_connect_timeout 60s;
proxy_read_timeout 60s;
proxy_send_timeout 60s;
client_max_body_size 1m;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Port $server_port;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_buffering on;
proxy_pass http://default-simpleapp-ingress-<someip>.nip.io-simple-app-80;
}
}
@markjgardner Thanks for reporting this to us. We are going to take a look and review this on our side.
it sure looks like nginx-ingress is bypassing the clusterIp service and proxying directly to the pod backends
You are correct. We talk directly to the endpoint IPs which we gather from the service. This allows us to take advantage of things in NGINX+ like active health checking capabilities, giving NGINX ingress controller configurable options to customers. You can see some of the options supported.
We will review and update this issue when we have finished our analysis. Thank you again.
The service endpoint is bypassed to provide true load balancing support across the backend service pods as well as to support behavior such as sticky sessions, cookie persistence, and other capabilities that are necessary for non-stateless services and many TCP (non HTTP) services. (there are a lot of non-stateless and TCP/UDP services out there)
individual pods can react and behave differently, and forwarding traffic direct to the individual pods allows the system to respond in a more natural way. It is possible across a cluster to have some pods on some nodes have a faster response and thus handle additional load. If the service endpoint IP is used, kube-proxy and its random-ish distribution behavior takes over and you rely on the infrastructure to wait for a pod to begin to fail before anything happens across the system.
NGINX is constantly monitoring the responsiveness of the backends to determine if they are still healthy and should be receiving traffic. This is actually more granular behavior awareness than a readiness probe might be. This also allows NGINX to inform you (the user) that particular pods are returning 500s or other response codes and which particular pod it is. This is in the Prometheus output of the NGINX Plus version of our implementation.
In the end it all about adding additional value to the system to give the end customer the best experience possible.
Ok, totally understand that this is intentional behavior. But if I'm understanding you correctly, there is still a bug as your healthchecks are failing to detect not-ready pods and remove them from the backend. Or there is some non-intuitive, non-default configuration that I am missing.
As for bypassing the k8s service...
I don't know much about nginx+ so forgive my ignorance, if you don't actually need the loadbalancing provided by a typical k8s service, why not put a validation requirement on your ingress controller to require clusterIp: none
on backing services? Seems like it would give you everything you need (ips for the backing pods) without getting in the way of your smarter ingress model. It would also act as a pretty clear flag to the uninitiated that there are non-conventional ingress semantics at play here.
if you don't actually need the loadbalancing provided by a typical k8s service, why not put a validation requirement on your ingress controller to require clusterIp: none on backing services?
We do this the other way around. We assume that you want the extra capabilities of load balancing controls, sticky sessions, smarter traffic distribution etc. that any proxy brings to the table. We give you the ability through our VirtualServer CRD to use the clusterIP for an upstream service on a service by service basis. Knowing that all services are not the same. https://docs.nginx.com/nginx-ingress-controller/configuration/virtualserver-and-virtualserverroute-resources/#upstream
This is also how this project is compatible with Linkerd, Istio, and Open Service Mesh. It is not necessary for NGINX Service Mesh as we function a bit differently under the hood.
This is available in both the free and paid editions - the resources are identical.
In regards to the behavior origonally described in this ticket. This is a gap in our implementation of the EndpointSlices API that we are taking care of. The Endpoints API behaved a bit differently and the ready status of the new API was missed. There will be a fix coming.
Just want to say that we are observing this problem as well, but it sounds like you guys already reproduced this and don't need any more confirmation.
hi @wc-s, yes we were able to reproduce the issue and a fix is in works
Patch is forthcoming. Thank you all! Sorry for any problems.
@wc-s @markjgardner fix for this issue is available in main
now, there will be a 3.0.2
release later.
@vepatel @brianehlert
Thanks! We already downgraded to 2.4.2, will try out the fix when the corresponding helm chart is released.
@wc-s @markjgardner new patch release 3.0.2
with the fix is now live, see: https://github.com/nginxinc/kubernetes-ingress/releases/tag/v3.0.2
Tested and verified. Thanks for the quick turnaround.
Describe the bug Ingress continues to route traffic to backend pods that are marked as not ready due to failing readinessProbe.
To Reproduce
helm install ingress nginx-ingress
touch /tmp/healthy
to pass the readinessProberm /tmp/healthy
and observe pod becomes not readyExpected behavior Requests should never route to the not-ready pod.
Your environment
Additional context This behavior is not present in https://kubernetes.github.io/ingress-nginx (requests are never routed to not-ready pods)