Open dbbDylan opened 3 days ago
This issue is currently awaiting triage.
If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
/remove-kind bug /kind support
You have only one pod of the controller so yes, you will get brief disruption during upgrade.
You can experiment with more than one replicas and the values for minAvailable etc.
Thanks for your support! @longwuyuan
As your suggestions here, I try to change my value-specific.yaml
:
+ replicaCount: 2
But the same error still occurred when the old pod switch to terminating
:
I also try to add sleep 15
before executing the wait-shutdown
, but it also not work.
Those are not the only values. Please explore others. Each use case is specific . For example I suggested but your response says you tried only one of my suggestions. Like increase replicas to maybe 3 and set minAvailable to 1 https://kubernetes.io/docs/tasks/run-application/configure-pdb/. This is for having at least 1 pod for new conections
If its about graceful draining of established connections, then please look at other such config options for timers etc. There is no well-documented use case with the controller for this. Each user finds their most suitable config by trial and error.
I've tried a lot of ways:
replicaCount: 2
and minAvailable: 1
replicaCount: 3
and minAvailable: 1
replicaCount: 3
and minAvailable: 2
replicaCount: 1
and minAvailable: 1
and preStop: ["/bin/sh", "-c", "sleep 15s && /wait-shutdown"]
All of them are not works.
However, I have found that all the errors are coming from the old pod when executing the “wait-shutdown” script. The old pod still receives messages when the controller is shutting down and before nginx terminates, but this is not as expected:
So I don't think it's a configuration issue, but rather a brief service interruption during graceful termination. In my opinion, the expected process maybe like:
But the current stage can't guarantee the second step happened before the third step. Could you double-check it?
Thanks for your strong support again.
func (srv *Server) ListenAndServe() error {
if srv.shuttingDown() {
return ErrServerClosed // the fatal error
}
addr := srv.Addr
if addr == "" {
addr = ":http"
}
ln, err := net.Listen("tcp", addr)
if err != nil {
return err
}
return srv.Serve(ln)
}
More information updated:
Once a pod transitions from Running
to Terminating
, the Endpoint
associated with the ingress-nginx-controller
Service
should have completed its IP change. Therefore, I suspect that the issue might not be with the ingress-nginx-controller
itself, but rather with the way the k6
load testing tool is handling connections. Could you help me confirm this hypothesis?
What happened:
ingress-nginx-controller
zero downtime upgrade investigation.helm upgrade --reuse-values
command to complete upgrade.The system operates smoothly if no requests are sent during the upgrade period. However, when using
Grafana K6
to monitor the frequency of HTTPS requests, an error occurs as the new controller pod is fully initialized and the old pod begins to terminate. This issue only lasts for a brief moment, yet it can be consistently reproduced.Here is the warning event:
And here is the
K6
test log:During this period, I encounter numerous empty responses, and there are no error logs in the ingress-nginx-controller pod. However, if a TCP connection has been established prior to this, it remains uninterrupted (tested it by
telnet ${my-tcp-service} ${port}
command).So I want to confirm if it's the upgrade caused short-lived service interruption of the
ingress-nginx-controller
?What you expected to happen:
No warnings should occur throughout the upgrade process, and any requests should be handled whether or not the returned status code is
200
.NGINX Ingress controller version (exec into the pod and run
/nginx-ingress-controller --version
): v1.11.2 & v1.11.3Kubernetes version (use
kubectl version
):Client Version: v1.30.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.30.5
Environment:
Cloud provider or hardware configuration: I used Gardener to control all clusters, so I have no permissions to check it.
OS (e.g. from /etc/os-release): linux-amd64
Kernel (e.g.
uname -a
):Install tools:
Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
Basic cluster related info:
kubectl get nodes -o wide
How was the ingress-nginx-controller installed:
helm ls -A | grep -i ingress
helm -n <ingresscontrollernamespace> get values <helmreleasename>
Current State of the controller:
kubectl describe ingressclasses
kubectl -n <ingresscontrollernamespace> get all -A -o wide
kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
Current state of ingress object, if applicable:
kubectl -n <appnamespace> get all,ing -o wide
kubectl -n <appnamespace> describe ing <ingressname>
kubectl describe ...
of any custom configmap(s) created and in useHow to reproduce this issue:
To reproduce it, you just need one web-service (any pod can receive HTTP request is ok). Then you can use this K6 script:
Anything else we need to know:
You can use my test image implemented by Go:
image: doublebiao/web-service-gin:v1.0-beta