`Warning Event: Readiness probe failed: HTTP probe failed with statuscode: 500` occured while upgrade

dbbDylan commented 3 days ago

What happened:

Background: ingress-nginx-controller zero downtime upgrade investigation.
Strategy: I used helm upgrade --reuse-values command to complete upgrade.

The system operates smoothly if no requests are sent during the upgrade period. However, when using Grafana K6 to monitor the frequency of HTTPS requests, an error occurs as the new controller pod is fully initialized and the old pod begins to terminate. This issue only lasts for a brief moment, yet it can be consistently reproduced.

Here is the warning event: readiness-probe-failed

And here is the K6 test log:

$ sh run.sh

         /\      Grafana   /‾‾/
    /\  /  \     |\  __   /  /
   /  \/    \    | |/ /  /   ‾‾\
  /          \   |   (  |  (‾)  |
 / __________ \  |_|\_\  \_____/

     execution: local
        script: script.js
        output: -

     scenarios: (100.00%) 1 scenario, 1024 max VUs, 2m30s max duration (incl. graceful stop):
              * default: 1024 looping VUs for 2m0s (gracefulStop: 30s)

WARN[0087] Request Failed                                error="Post \"http://my-hostname/v1/tests/post\": EOF"                                                                                                                          
WARN[0087] Request Failed                                error="Post \"http://my-hostname/v1/tests/post\": read tcp 10.59.89.82:59064->10.47.104.129:80: wsarecv: An existing connection was forcibly closed by the remote host."        
WARN[0087] Request Failed                                error="Post \"http://my-hostname/v1/tests/post\": EOF"                                                                                                                                    
WARN[0087] Request Failed                                error="Post \"http://my-hostname/v1/tests/post\": read tcp 10.59.89.82:59082->10.47.104.129:80: wsarecv: An existing connection was forcibly closed by the remote host."                                                                                                                          
WARN[0087] Request Failed                                error="Post \"http://my-hostname/v1/tests/post\": EOF"                                                                                                                          

     data_received..................: 37 MB 295 kB/s
     data_sent......................: 12 MB 93 kB/s
     http_req_blocked...............: avg=23.5ms   min=0s       med=0s    max=731.46ms p(90)=0s    p(95)=510.49µs
     http_req_connecting............: avg=14.79ms  min=0s       med=0s    max=343.54ms p(90)=0s    p(95)=0s
     http_req_duration..............: avg=2.81s    min=3.12ms   med=2.8s  max=10.18s   p(90)=4.82s p(95)=5.07s
       { expected_response:true }...: avg=2.81s    min=313.71ms med=2.81s max=10.18s   p(90)=4.83s p(95)=5.07s
     http_req_failed................: 0.26% 117 out of 43956
     http_req_receiving.............: avg=468.21µs min=0s       med=0s    max=14.93ms  p(90)=987µs p(95)=2.21ms
     http_req_sending...............: avg=21.26µs  min=0s       med=0s    max=8.52ms   p(90)=0s    p(95)=0s
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s    max=0s       p(90)=0s    p(95)=0s
     http_req_waiting...............: avg=2.81s    min=3.12ms   med=2.8s  max=10.18s   p(90)=4.82s p(95)=5.07s
     http_reqs......................: 43956 350.979203/s
     iteration_duration.............: avg=2.83s    min=13.56ms  med=2.82s max=10.18s   p(90)=4.85s p(95)=5.09s
     iterations.....................: 43956 350.979203/s
     vus............................: 10    min=10           max=1024
     vus_max........................: 1024  min=1024         max=1024

running (2m05.2s), 0000/1024 VUs, 43956 complete and 0 interrupted iterations                                                                                                                                                                         
default ✓ [======================================] 1024 VUs  2m0s

During this period, I encounter numerous empty responses, and there are no error logs in the ingress-nginx-controller pod. However, if a TCP connection has been established prior to this, it remains uninterrupted (tested it by telnet ${my-tcp-service} ${port} command).

So I want to confirm if it's the upgrade caused short-lived service interruption of the ingress-nginx-controller?

What you expected to happen:

No warnings should occur throughout the upgrade process, and any requests should be handled whether or not the returned status code is 200.

NGINX Ingress controller version (exec into the pod and run /nginx-ingress-controller --version): v1.11.2 & v1.11.3

Kubernetes version (use kubectl version):

Client Version: v1.30.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.30.5

Environment:

Cloud provider or hardware configuration: I used Gardener to control all clusters, so I have no permissions to check it.
OS (e.g. from /etc/os-release): linux-amd64
Kernel (e.g. uname -a):
Install tools:
- Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.

Basic cluster related info:

kubectl get nodes -o wide

$ kubectl get nodes -o wide
NAME                                                       STATUS   ROLES    AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE              KERNEL-VERSION       CONTAINER-RUNTIME
shoot--gtlcdevqa--dylan-test-worker-tt7i7-z1-6db57-25mdw   Ready    <none>   88m   v1.30.5   10.180.0.213   <none>        Garden Linux 1592.3   6.6.62-cloud-amd64   containerd://1.7.20
shoot--gtlcdevqa--dylan-test-worker-tt7i7-z1-6db57-mn8zv   Ready    <none>   89m   v1.30.5   10.180.0.187   <none>        Garden Linux 1592.3   6.6.62-cloud-amd64   containerd://1.7.20

How was the ingress-nginx-controller installed:

If helm was used then please show output of helm ls -A | grep -i ingress

$ helm ls -A | grep -i ingress
ingress-nginx           ingress-nginx           28              2024-11-18 16:34:27.1373854 +0800 CST   deployed    ingress-nginx-4.11.3            1.11.3

If helm was used then please show output of helm -n <ingresscontrollernamespace> get values <helmreleasename>

$ helm -n ingress-nginx get values ingress-nginx
USER-SUPPLIED VALUES:
controller:
allowSnippetAnnotations: true
config:
  client-body-timeout: "360"
  proxy-body-size: 1024m
  proxy-buffer-size: 16k
  proxy-connect-timeout: "30"
  proxy-read-timeout: "3600"
  proxy-send-timeout: "900"
  proxy-set-headers: ingress-nginx/custom-headers
extraArgs:
  configmap: $(POD_NAMESPACE)/ingress-nginx-controller
  controller-class: k8s.io/ingress-nginx
  default-ssl-certificate: ingress-nginx/gtlconlycert
  enable-ssl-passthrough: "true"
  ingress-class: nginx
  publish-service: $(POD_NAMESPACE)/ingress-nginx-controller
  tcp-services-configmap: $(POD_NAMESPACE)/ingress-nginx-tcp
  validating-webhook: :8443
  validating-webhook-certificate: /usr/local/certificates/cert
  validating-webhook-key: /usr/local/certificates/key
  watch-ingress-without-class: "true"
metrics:
  enabled: true
  service:
    annotations:
      prometheus.io/port: "10254"
      prometheus.io/scrape: "true"
  serviceMonitor:
    enabled: true
    namespace: kube-prometheus-stack
    scrapeInterval: 500ms
tcp:
"31080": prod/blackduck-report:1081

Current State of the controller:

kubectl describe ingressclasses

Name:         nginx
Labels:       app.kubernetes.io/component=controller
            app.kubernetes.io/instance=ingress-nginx
            app.kubernetes.io/managed-by=Helm
            app.kubernetes.io/name=ingress-nginx
            app.kubernetes.io/part-of=ingress-nginx
            app.kubernetes.io/version=1.11.3
            helm.sh/chart=ingress-nginx-4.11.3
Annotations:  meta.helm.sh/release-name: ingress-nginx
            meta.helm.sh/release-namespace: ingress-nginx
Controller:   k8s.io/ingress-nginx
Events:       <none>

kubectl -n <ingresscontrollernamespace> get all -A -o wide

$ kubectl -n ingress-nginx get all -o wide
NAME                                            READY   STATUS    RESTARTS   AGE     IP            NODE                                                       NOMINATED NODE   READINESS GATES
pod/ingress-nginx-controller-67fbb67c7b-tpfpt   1/1     Running   0          3d22h   100.64.1.23   shoot--gtlcdevqa--dylan-test-worker-tt7i7-z1-6db57-25mdw   <none>           <none>

NAME                                         TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                                      AGE   SELECTOR
service/ingress-nginx-controller             LoadBalancer   100.111.24.47    10.47.104.129   80:31686/TCP,443:32033/TCP,31080:31568/TCP   25d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
service/ingress-nginx-controller-admission   ClusterIP      100.106.5.80     <none>          443/TCP                                      25d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
service/ingress-nginx-controller-metrics     ClusterIP      100.110.133.77   <none>          10254/TCP                                    14d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                                                                                                                   
SELECTOR
deployment.apps/ingress-nginx-controller   1/1     1            1           25d   controller   registry.k8s.io/ingress-nginx/controller:v1.11.3@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME                                                  DESIRED   CURRENT   READY   AGE    CONTAINERS   IMAGES                                                                                                            
       SELECTOR
replicaset.apps/ingress-nginx-controller-56bcbbf9bc   0         0         0       4d1h   controller   registry.k8s.io/ingress-nginx/controller:v1.11.2@sha256:d5f8217feeac4887cb1ed21f27c2674e58be06bd8f5184cacea2a69abaf78dce   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=56bcbbf9bc
replicaset.apps/ingress-nginx-controller-67fbb67c7b   1         1         1       4d1h   controller   registry.k8s.io/ingress-nginx/controller:v1.11.3@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=67fbb67c7b

kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>

$ kubectl describe po -n ingress-nginx ingress-nginx-controller-67fbb67c7b-tpfpt
Name:             ingress-nginx-controller-67fbb67c7b-tpfpt
Namespace:        ingress-nginx
Priority:         0
Service Account:  ingress-nginx
Node:             shoot--gtlcdevqa--dylan-test-worker-tt7i7-z1-6db57-25mdw/10.180.0.213
Start Time:       Fri, 22 Nov 2024 16:11:19 +0800
Labels:           app.kubernetes.io/component=controller
                app.kubernetes.io/instance=ingress-nginx
                app.kubernetes.io/managed-by=Helm
                app.kubernetes.io/name=ingress-nginx
                app.kubernetes.io/part-of=ingress-nginx
                app.kubernetes.io/version=1.11.3
                helm.sh/chart=ingress-nginx-4.11.3
                pod-template-hash=67fbb67c7b
Annotations:      cni.projectcalico.org/containerID: 6b2b57de91e25a2c7dbdac5dc865f7c3c09ae62b4b1a1269a1eb4c3070328020
                cni.projectcalico.org/podIP: 100.64.1.23/32
                cni.projectcalico.org/podIPs: 100.64.1.23/32
Status:           Running
IP:               100.64.1.23
IPs:
IP:           100.64.1.23
Controlled By:  ReplicaSet/ingress-nginx-controller-67fbb67c7b
Containers:
controller:
  Container ID:    containerd://cd4e18fc7e76caaabc2fed13acd26af7fef665f2e01a645503c3d8661a091831
  Image:           registry.k8s.io/ingress-nginx/controller:v1.11.3@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7
  Image ID:        registry.k8s.io/ingress-nginx/controller@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7
  Ports:           80/TCP, 443/TCP, 10254/TCP, 8443/TCP, 31080/TCP
  Host Ports:      0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
  SeccompProfile:  RuntimeDefault
  Args:
    /nginx-ingress-controller
    --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
    --election-id=ingress-nginx-leader
    --controller-class=k8s.io/ingress-nginx
    --ingress-class=nginx
    --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
    --tcp-services-configmap=$(POD_NAMESPACE)/ingress-nginx-tcp
    --validating-webhook=:8443
    --validating-webhook-certificate=/usr/local/certificates/cert
    --validating-webhook-key=/usr/local/certificates/key
    --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
    --controller-class=k8s.io/ingress-nginx
    --default-ssl-certificate=ingress-nginx/gtlconlycert
    --enable-ssl-passthrough=true
    --ingress-class=nginx
    --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
    --tcp-services-configmap=$(POD_NAMESPACE)/ingress-nginx-tcp
    --validating-webhook=:8443
    --validating-webhook-certificate=/usr/local/certificates/cert
    --validating-webhook-key=/usr/local/certificates/key
    --watch-ingress-without-class=true
  State:          Running
    Started:      Fri, 22 Nov 2024 16:13:05 +0800
  Ready:          True
  Restart Count:  0
  Requests:
    cpu:      100m
    memory:   90Mi
  Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5        
  Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3        
  Environment:
    POD_NAME:                 ingress-nginx-controller-67fbb67c7b-tpfpt (v1:metadata.name)
    POD_NAMESPACE:            ingress-nginx (v1:metadata.namespace)
    LD_PRELOAD:               /usr/local/lib/libmimalloc.so
    KUBERNETES_SERVICE_HOST:  api.dylan-test.gtlcdevqa.internal.canary.k8s.ondemand.com
  Mounts:
    /usr/local/certificates/ from webhook-cert (ro)
    /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d2v66 (ro)
Conditions:
Type                        Status
PodReadyToStartContainers   True
Initialized                 True
Ready                       True
ContainersReady             True
PodScheduled                True
Volumes:
webhook-cert:
  Type:        Secret (a volume populated by a Secret)
  SecretName:  ingress-nginx-admission
  Optional:    false
kube-api-access-d2v66:
  Type:                    Projected (a volume that contains injected data from multiple sources)
  TokenExpirationSeconds:  3607
  ConfigMapName:           kube-root-ca.crt
  ConfigMapOptional:       <nil>
  DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                           node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>

$ kubectl -n ingress-nginx describe svc ingress-nginx-controller
Name:                     ingress-nginx-controller
Namespace:                ingress-nginx
Labels:                   app.kubernetes.io/component=controller
                        app.kubernetes.io/instance=ingress-nginx
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=ingress-nginx
                        app.kubernetes.io/part-of=ingress-nginx
                        app.kubernetes.io/version=1.11.3
                        helm.sh/chart=ingress-nginx-4.11.3
Annotations:              loadbalancer.openstack.org/load-balancer-address: 10.47.104.129
                        loadbalancer.openstack.org/load-balancer-id: 54ef842a-05c0-482a-b3bf-255012af91d8 
                        meta.helm.sh/release-name: ingress-nginx
                        meta.helm.sh/release-namespace: ingress-nginx
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       100.111.24.47
IPs:                      100.111.24.47
LoadBalancer Ingress:     10.47.104.129
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  31686/TCP
Endpoints:                100.64.1.23:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  32033/TCP
Endpoints:                100.64.1.23:443
Port:                     31080-tcp  31080/TCP
TargetPort:               31080-tcp/TCP
NodePort:                 31080-tcp  31568/TCP
Endpoints:                100.64.1.23:31080
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

Current state of ingress object, if applicable:

kubectl -n <appnamespace> get all,ing -o wide

$ kubectl -n web-service get ingress -owide
NAME                      CLASS    HOSTS                      ADDRESS         PORTS   AGE
web-service-gin-ingress   <none>   my-host   10.47.104.129   80      8d

kubectl -n <appnamespace> describe ing <ingressname>

$ kubectl describe ingress web-service-gin-ingress -n web-service 
Name:             web-service-gin-ingress
Labels:           <none>
Namespace:        web-service
Address:          10.47.104.129
Ingress Class:    <none>
Default backend:  <default>
Rules:
Host                      Path  Backends
----                      ----  --------
my-host
                          /   web-service-gin-service:8080 (100.64.1.4:8080,100.64.1.5:8080,100.64.1.6:8080)
Annotations:                nginx.ingress.kubernetes.io/configuration-snippet: more_set_headers "X-Ingress-Pod-Name: $HOSTNAME";
Events:                     <none>

If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag

$ GUID=1
$ DATETIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
$ curl -X POST "http://my-host/v1/tests/post" -H "Content-Type: application/json" -d "{
    \"id\": \"$GUID\", 
    \"create_time\": \"$DATETIME\",
    \"sleep_time_ms\": 10
}"
{"id":"1","ingress_pod_name_form":"ingress-nginx-controller-67fbb67c7b-tpfpt","create_time":"2024-11-22T09:58:05Z","receive_time":"2024-11-22T09:58:52.425334756Z","finish_time":"2024-11-22T09:58:52.435498011Z","consume_sec":0.010163236}

$ curl -vX POST "http://my-host/v1/tests/post" -H "Content-Type: application/json" -d "{   
    \"id\": \"$GUID\",
    \"create_time\": \"$DATETIME\",
    \"sleep_time_ms\": 10
}"
Note: Unnecessary use of -X or --request, POST is already inferred.
* Host my-host:80 was resolved.
* IPv6: (none)
* IPv4: 10.47.104.129
*   Trying 10.47.104.129:80...
* Connected to dylan-test.gtlc.only.sap (10.47.104.129) port 80
* using HTTP/1.x
> POST /v1/tests/post HTTP/1.1
> Host: dylan-test.gtlc.only.sap
> User-Agent: curl/8.10.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 87
>
* upload completely sent off: 87 bytes
< HTTP/1.1 200 OK
< Date: Fri, 22 Nov 2024 10:00:18 GMT
< Content-Type: application/json; charset=utf-8
< Content-Length: 236
< Connection: keep-alive
< Access-Control-Allow-Credentials: true
< Access-Control-Allow-Headers: Content-Type, Content-Length, Accept-Encoding, X-CSRF-Token, Authorization, accept, origin, Cache-Control, X-Requested-With
< Access-Control-Allow-Methods: POST, OPTIONS, GET, PUT, DELETE
< Access-Control-Allow-Origin: *
< X-Ingress-Pod-Name-From: ingress-nginx-controller-67fbb67c7b-tpfpt
< X-Ingress-Pod-Name: ingress-nginx-controller-67fbb67c7b-tpfpt
<
{"id":"1","ingress_pod_name_form":"ingress-nginx-controller-67fbb67c7b-tpfpt","create_time":"2024-11-22T09:58:05Z","receive_time":"2024-11-22T10:00:18.300485598Z","finish_time":"2024-11-22T10:00:18.310760665Z","consume_sec":0.010275065}* Connection #0 to host my-host left intact

Others:
- Any other related information like ;
- copy/paste of the snippet (if applicable)
- kubectl describe ... of any custom configmap(s) created and in use
- Any other related information that may help

How to reproduce this issue:

To reproduce it, you just need one web-service (any pod can receive HTTP request is ok). Then you can use this K6 script:

import http from 'k6/http';
import { uuidv4 } from 'https://jslib.k6.io/k6-utils/1.4.0/index.js';

export const options = {
  vus: 1024,
  duration: '120s',
};

function getFormattedDateTimeNow() {
  const now = new Date();
  const isoString = now.toISOString();

  return isoString;
}

function formattedResponseOutput(res) {
  const status = res.status;
  const statusText = res.status_text;
  const to = res.headers['X-Ingress-Pod-Name'];
  const from = res.headers['X-Ingress-Pod-Name-From'];

  if (res.status != 200) {
    console.log(`[${from}] --> [${to}] : { Status: ${status}, Status Text: ${statusText} }`);
  } else {
    console.log(`[${from}] --> [${to}] : { Status: ${status}, ResponseBody: ${res.body} }`);
  }
}

export default function () {
  const url = 'http://my-host/v1/tests/post';
  const sleep_upper_limit_ms = 5000

  const playload = JSON.stringify({
    "id": uuidv4(),
    "create_time": getFormattedDateTimeNow(),
    "sleep_time_ms": Math.floor(Math.random() * (sleep_upper_limit_ms + 1)), 
  })

  const params = {
    headers: {
      'Content-Type': 'application/json',
    },
  };

  const res = http.post(url, playload, params);
  formattedResponseOutput(res);
}

Anything else we need to know:

You can use my test image implemented by Go: image: doublebiao/web-service-gin:v1.0-beta

k8s-ci-robot commented 3 days ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

longwuyuan commented 3 days ago

/remove-kind bug /kind support

You have only one pod of the controller so yes, you will get brief disruption during upgrade.

You can experiment with more than one replicas and the values for minAvailable etc.

dbbDylan commented 20 hours ago

Thanks for your support! @longwuyuan

As your suggestions here, I try to change my value-specific.yaml:

+    replicaCount: 2

But the same error still occurred when the old pod switch to terminating:

I also try to add sleep 15 before executing the wait-shutdown, but it also not work.

longwuyuan commented 20 hours ago

Those are not the only values. Please explore others. Each use case is specific . For example I suggested but your response says you tried only one of my suggestions. Like increase replicas to maybe 3 and set minAvailable to 1 https://kubernetes.io/docs/tasks/run-application/configure-pdb/. This is for having at least 1 pod for new conections

If its about graceful draining of established connections, then please look at other such config options for timers etc. There is no well-documented use case with the controller for this. Each user finds their most suitable config by trial and error.

dbbDylan commented 18 hours ago

I've tried a lot of ways:

replicaCount: 2 and minAvailable: 1
replicaCount: 3 and minAvailable: 1
replicaCount: 3 and minAvailable: 2
replicaCount: 1 and minAvailable: 1 and preStop: ["/bin/sh", "-c", "sleep 15s && /wait-shutdown"]

All of them are not works.

However, I have found that all the errors are coming from the old pod when executing the “wait-shutdown” script. The old pod still receives messages when the controller is shutting down and before nginx terminates, but this is not as expected:

So I don't think it's a configuration issue, but rather a brief service interruption during graceful termination. In my opinion, the expected process maybe like:

Graceful termination started.
Network traffic changed.
(Old pod stops receiving requests) Nginx service stopped.
Old pod deleted.

But the current stage can't guarantee the second step happened before the third step. Could you double-check it?

Thanks for your strong support again.

dbbDylan commented 18 hours ago

func (srv *Server) ListenAndServe() error {
    if srv.shuttingDown() {
        return ErrServerClosed  // the fatal error
    }
    addr := srv.Addr
    if addr == "" {
        addr = ":http"
    }
    ln, err := net.Listen("tcp", addr)
    if err != nil {
        return err
    }
    return srv.Serve(ln)
}

dbbDylan commented 17 hours ago

More information updated:

Once a pod transitions from Running to Terminating, the Endpoint associated with the ingress-nginx-controller Service should have completed its IP change. Therefore, I suspect that the issue might not be with the ingress-nginx-controller itself, but rather with the way the k6 load testing tool is handling connections. Could you help me confirm this hypothesis?

kubernetes / ingress-nginx

`Warning Event: Readiness probe failed: HTTP probe failed with statuscode: 500` occured while upgrade #12401