kubernetes / ingress-nginx

Ingress-NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
16.92k stars 8.14k forks source link

Server scaling causes ingress TCP retransmissions, and the client cannot access the server properly. #11508

Open ZhengXinwei-F opened 5 days ago

ZhengXinwei-F commented 5 days ago

What happened: I have a server providing list-watch capabilities exposed to the outside via an ingress-controller. When there are only 1-2 instances of the server, scaling operations cause client requests to fail but they recover quickly. However, when there are three instances, the client (using client-go) does not break its TCP connection with the ingress (this is issue 1). Additionally, the ingress continues to send TCP SYN packets to the already dead pod, resulting in continuous TCP retransmissions. As a result, the client cannot properly access the ingress.

For example:

image

Ingress-controller logs:

2024/06/26 07:44:16 [error] 4441#4441: *4049959 upstream prematurely closed connection while reading upstream, client: 172.18.0.1, server: local-server.com, request: "GET /?watch HTTP/1.1", upstream: "http://10.244.1.170:80/?watch", host: "local-server.com:80"
2024/06/26 07:44:16 [error] 4617#4617: *4049960 upstream prematurely closed connection while reading upstream, client: 172.18.0.1, server: local-server.com, request: "GET /?watch HTTP/1.1", upstream: "http://10.244.1.170:80/?watch", host: "local-server.com:80"
172.18.0.1 - - [26/Jun/2024:07:44:16 +0000] "GET /?watch HTTP/1.1" 200 170 "-" "Go-http-client/1.1" 106 5.168 [default-server-80] [] 10.244.1.170:80 170 5.169 200 6072e3f8859cfc330e1b6742700e1205
172.18.0.1 - - [26/Jun/2024:07:44:16 +0000] "GET /?watch HTTP/1.1" 200 170 "-" "Go-http-client/1.1" 106 5.168 [default-server-80] [] 10.244.1.170:80 170 5.169 200 ac5c49504b6751d2574355d5c4fd554c
2024/06/26 07:44:16 [error] 4666#4666: *4049958 upstream prematurely closed connection while reading upstream, client: 172.18.0.1, server: local-server.com, request: "GET /?watch HTTP/1.1", upstream: "http://10.244.1.169:80/?watch", host: "local-server.com:80"
172.18.0.1 - - [26/Jun/2024:07:44:16 +0000] "GET /?watch HTTP/1.1" 200 170 "-" "Go-http-client/1.1" 106 5.173 [default-server-80] [] 10.244.1.169:80 170 5.174 200 abaa63361000599d0fbc60bac0961194
W0626 07:44:20.334066      10 controller.go:1163] Service "default/server" does not have any active Endpoint.
2024/06/26 07:45:17 [error] 4568#4568: *4050262 upstream timed out (110: Operation timed out) while connecting to upstream, client: 172.18.0.1, server: local-server.com, request: "GET /?watch HTTP/1.1", upstream: "http://10.244.1.171:80/?watch", host: "local-server.com:80"
2024/06/26 07:45:17 [error] 4443#4443: *4050261 upstream timed out (110: Operation timed out) while connecting to upstream, client: 172.18.0.1, server: local-server.com, request: "GET /?watch HTTP/1.1", upstream: "http://10.244.1.170:80/?watch", host: "local-server.com:80"
2024/06/26 07:45:17 [error] 4443#4443: *4050265 upstream timed out (110: Operation timed out) while connecting to upstream, client: 172.18.0.1, server: local-server.com, request: "GET /?watch HTTP/1.1", upstream: "http://10.244.1.169:80/?watch", host: "local-server.com:80"
2024/06/26 07:46:17 [error] 4568#4568: *4050262 upstream timed out (110: Operation timed out) while connecting to upstream, client: 172.18.0.1, server: local-server.com, request: "GET /?watch HTTP/1.1", upstream: "http://10.244.1.169:80/?watch", host: "local-server.com:80"
2024/06/26 07:46:17 [error] 4443#4443: *4050261 upstream timed out (110: Operation timed out) while connecting to upstream, client: 172.18.0.1, server: local-server.com, request: "GET /?watch HTTP/1.1", upstream: "http://10.244.1.171:80/?watch", host: "local-server.com:80"
2024/06/26 07:46:17 [error] 4443#4443: *4050265 upstream timed out (110: Operation timed out) while connecting to upstream, client: 172.18.0.1, server: local-server.com, request: "GET /?watch HTTP/1.1", upstream: "http://10.244.1.170:80/?watch", host: "local-server.com:80"
2024/06/26 07:47:17 [error] 4568#4568: *4050262 upstream timed out (110: Operation timed out) while connecting to upstream, client: 172.18.0.1, server: local-server.com, request: "GET /?watch HTTP/1.1", upstream: "http://10.244.1.170:80/?watch", host: "local-server.com:80"
172.18.0.1 - - [26/Jun/2024:07:47:17 +0000] "GET /?watch HTTP/1.1" 504 160 "-" "Go-http-client/1.1" 106 180.002 [default-server-80] [] 10.244.1.171:80, 10.244.1.169:80, 10.244.1.170:80 0, 0, 0 60.001, 60.000, 60.000 504, 504, 504 e19af9f4f40a060271392f1c90d24247
2024/06/26 07:47:17 [error] 4443#4443: *4050261 upstream timed out (110: Operation timed out) while connecting to upstream, client: 172.18.0.1, server: local-server.com, request: "GET /?watch HTTP/1.1", upstream: "http://10.244.1.169:80/?watch", host: "local-server.com:80"
172.18.0.1 - - [26/Jun/2024:07:47:17 +0000] "GET /?watch HTTP/1.1" 504 160 "-" "Go-http-client/1.1" 106 180.002 [default-server-80] [] 10.244.1.170:80, 10.244.1.171:80, 10.244.1.169:80 0, 0, 0 60.001, 60.001, 60.000 504, 504, 504 53c8f7c1fd45dd30187dd24824eeb1ec
2024/06/26 07:47:17 [error] 4443#4443: *4050265 upstream timed out (110: Operation timed out) while connecting to upstream, client: 172.18.0.1, server: local-server.com, request: "GET /?watch HTTP/1.1", upstream: "http://10.244.1.171:80/?watch", host: "local-server.com:80"
172.18.0.1 - - [26/Jun/2024:07:47:17 +0000] "GET /?watch HTTP/1.1" 504 160 "-" "Go-http-client/1.1" 106 180.002 [default-server-80] [] 10.244.1.169:80, 10.244.1.170:80, 10.244.1.171:80 0, 0, 0 60.001, 60.000, 60.000 504, 504, 504 b1e69853960f121cf045c910664e4956
172.18.0.1 - - [26/Jun/2024:07:50:06 +0000] "GET /?watch HTTP/1.1" 200 5678 "-" "Go-http-client/1.1" 106 167.703 [default-server-80] [] 10.244.1.175:80 5678 167.703 200 9eb6ce390a6ec009b46e668aba44f181
172.18.0.1 - - [26/Jun/2024:07:50:06 +0000] "GET /?watch HTTP/1.1" 200 5678 "-" "Go-http-client/1.1" 106 167.708 [default-server-80] [] 10.244.1.175:80 5678 167.708 200 fd3243efecdc005c0fdbee007772a1b1
172.18.0.1 - - [26/Jun/2024:07:50:06 +0000] "GET /?watch HTTP/1.1" 200 5678 "-" "Go-http-client/1.1" 106 167.708 [default-server-80] [] 10.244.1.173:80 5678 167.708 200 7a388eef75f9d99a22a63798782f65e8

Service "default/server" does not have any active Endpoint.

This indicates that the ingress recognizes there are no active endpoints after a failed request. However, it still attempts to access the already dead pod. Packet capture shows TCP retransmissions and triggers backoff time.

image

What you expected to happen:

The client can properly access the ingress.

This issue seems to be caused by some form of endpoint caching in the ingress. However, it does not occur when there are only 1-2 server instances, but it does occur when there are 3 or more instances. Could you please help me analyze this issue? Thank you very much.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):


NGINX Ingress controller Release: v1.6.3 Build: 7ae9ca2f1d04ccdc818fd0d3fb0193266b739d68 Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.21.6


&


NGINX Ingress controller Release: v0.49.3 Build: git-0a2ec01eb Repository: https://github.com/kubernetes/ingress-nginx.git nginx version: nginx/1.20.1


Kubernetes version (use kubectl version):

Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.21.5
WARNING: version difference between client (1.30) and server (1.21) exceeds the supported minor version skew of +/-1

Environment:

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/ingress-nginx-controller NodePort 10.96.252.74 80:31852/TCP,443:32612/TCP 27h service/ingress-nginx-controller-admission ClusterIP 10.96.82.89 443/TCP 27h

NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/ingress-nginx-controller 1/1 1 1 27h

NAME DESIRED CURRENT READY AGE replicaset.apps/ingress-nginx-controller-67d59d7bc 1 1 1 27h

NAME COMPLETIONS DURATION AGE job.batch/ingress-nginx-admission-create 1/1 24s 27h job.batch/ingress-nginx-admission-patch 1/1 26s 27h

  - `kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>`

➜ ~ kubectl describe pods -n ingress-nginx ingress-nginx-controller-67d59d7bc-rt9wm Name: ingress-nginx-controller-67d59d7bc-rt9wm Namespace: ingress-nginx Priority: 0 Service Account: ingress-nginx Node: local-control-plane/172.18.0.2 Start Time: Tue, 25 Jun 2024 17:01:51 +0800 Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/name=ingress-nginx pod-template-hash=67d59d7bc Annotations: Status: Running IP: 10.244.0.10 IPs: IP: 10.244.0.10 Controlled By: ReplicaSet/ingress-nginx-controller-67d59d7bc Containers: controller: Container ID: containerd://e47ac4872bbca44f589f2c1aef386139cda86b616618b8342871f177c89ad96b Image: registry.k8s.io/ingress-nginx/controller:v1.6.3@sha256:b92667e0afde1103b736e6a3f00dd75ae66eec4e71827d19f19f471699e909d2 Image ID: registry.k8s.io/ingress-nginx/controller@sha256:b92667e0afde1103b736e6a3f00dd75ae66eec4e71827d19f19f471699e909d2 Ports: 80/TCP, 443/TCP, 8443/TCP Host Ports: 80/TCP, 443/TCP, 0/TCP Args: /nginx-ingress-controller --election-id=ingress-nginx-leader --controller-class=k8s.io/ingress-nginx --ingress-class=nginx --configmap=$(POD_NAMESPACE)/ingress-nginx-controller --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating-webhook-key=/usr/local/certificates/key --watch-ingress-without-class=true --publish-status-address=localhost State: Running Started: Tue, 25 Jun 2024 17:01:52 +0800 Ready: True Restart Count: 0 Requests: cpu: 100m memory: 90Mi Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5 Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: POD_NAME: ingress-nginx-controller-67d59d7bc-rt9wm (v1:metadata.name) POD_NAMESPACE: ingress-nginx (v1:metadata.namespace) LD_PRELOAD: /usr/local/lib/libmimalloc.so Mounts: /usr/local/certificates/ from webhook-cert (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zljhb (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: webhook-cert: Type: Secret (a volume populated by a Secret) SecretName: ingress-nginx-admission Optional: false kube-api-access-zljhb: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: Burstable Node-Selectors: ingress-ready=true kubernetes.io/os=linux Tolerations: node-role.kubernetes.io/control-plane:NoSchedule node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events:

  - `kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>`

➜ ~ kubectl describe svc -n ingress-nginx ingress-nginx-controller Name: ingress-nginx-controller Namespace: ingress-nginx Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.6.3 Annotations: Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx Type: NodePort IP Family Policy: SingleStack IP Families: IPv4 IP: 10.96.252.74 IPs: 10.96.252.74 Port: http 80/TCP TargetPort: http/TCP NodePort: http 31852/TCP Endpoints: 10.244.0.10:80 Port: https 443/TCP TargetPort: https/TCP NodePort: https 32612/TCP Endpoints: 10.244.0.10:443 Session Affinity: None External Traffic Policy: Cluster Events:

k8s-ci-robot commented 5 days ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
ZhengXinwei-F commented 5 days ago

Currently, I'm using nginx.ingress.kubernetes.io/service-upstream: "true" to avoid this issue by utilizing kube-proxy to forward traffic to the pods.

longwuyuan commented 5 days ago

@ZhengXinwei-F the information you have provided is useful to become aware but useless to do much analysis or debugging

ZhengXinwei-F commented 5 days ago

Here is my simple demo code:

server code:

package main

import (
    "fmt"
    "net/http"
    "strings"
    "time"

    "github.com/gin-gonic/gin"
)

func watchHandler(c *gin.Context) {
    if strings.Contains(c.Request.URL.RawQuery, "watch") {
        c.Header("Transfer-Encoding", "chunked")
        c.Header("Content-Type", "text/plain")

        w := c.Writer
        ctx := c.Request.Context()

        ticker := time.NewTicker(1 * time.Second)
        defer ticker.Stop()
        for {
            select {
            case <-ticker.C:
                now := time.Now().Format(time.RFC3339)
                fmt.Fprintf(w, "%x\r\n%s\r\n\r\n", len(now), now)
                flusher, ok := w.(http.Flusher)
                if ok {
                    flusher.Flush()
                }

            case <-ctx.Done():
                fmt.Println("Client disconnected")
                return
            }
        }
    } else {
        c.String(http.StatusOK, "Not watching...")
    }
}

func main() {
    r := gin.Default()

    r.GET("/", watchHandler)

    fmt.Println("Listening on http://127.0.0.1:8080/")
    err := r.Run(":80")
    if err != nil {
        panic(err)
    }
}

client:

package main

import (
    "bufio"
    "fmt"
    "io"
    "net/http"
    "strings"
    "time"
)

func main() {
    url := "http://local-server.com:80/?watch"
    for i := 0; i < 3; i++ {
        go func() {
            for {
                err := watch(url)
                if err != nil {
                    fmt.Printf("Error watching %s: %v\n", url, err)
                }
                time.Sleep(1 * time.Second)
            }
        }()
    }
    time.Sleep(1 * time.Hour)
}

func watch(url string) error {
    fmt.Println("=== start watching at ",time.Now())
    resp, err := http.Get(url)
    if err != nil {
        return err
    }
    defer resp.Body.Close()

    if resp.StatusCode != http.StatusOK {
        return fmt.Errorf("[%s] unexpected status code: %d", time.Now(), resp.StatusCode)
    }

    reader := bufio.NewReader(resp.Body)
    for {
        chunk, err := reader.ReadBytes('\n')
        if err != nil {
            if err == io.EOF {
                return nil
            }
            fmt.Println(err.Error())
            return err
        }

        fmt.Println(strings.TrimSpace(string(chunk)))
    }
}

server.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: server
  template:
    metadata:
      labels:
        app: server
    spec:
      terminationGracePeriodSeconds: 30
      containers:
      - name: server-container
        image: server:1.0.5 
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: server
spec:
  selector:
    app: server
  ports:
    - protocol: TCP
      port: 80  
      targetPort: 80 
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/cors-allow-origin: '*'
    nginx.ingress.kubernetes.io/enable-cors: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: 1024m
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "2592000"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "2592000"
      #nginx.ingress.kubernetes.io/websocket-services: "server"
  name: server-ingress
spec:
  ingressClassName: nginx
  rules:
    - host: local-server.com 
      http:
        paths:
          - path: /  
            pathType: Prefix
            backend:
              service:
                name: server
                port:
                  number: 80

hosts: 172.18.0.2 local-server.com (kind control-plane nodes)

ZhengXinwei-F commented 5 days ago

@longwuyuan

  • If your app needs affinity and you have not configured affinity to a pod, then the controller is not going to automatically send all traffic to the same backend pod. LoadBalancing is inbuilt to the controller and backend pods are expected to be idempotent by default. This is only if your app is dependent on persistent connection to same pod so same process in pod (if)
  • There are users who use many many replicas of backend pods and don't have problems so there is no bug as such with the controller and multiple replicas of backend

Thank you so much for your response. I will pay attention to my information format in future issues. However, I'm particularly curious about why the ingress continues to send TCP requests to the removed pod even after all pods have been deleted.

longwuyuan commented 5 days ago

At a glance you did NOT test with a NON-golang HTTP library like ;

So there is no data about established UN-Terminated connections to a backend.

Secondly, I can't find this annotation "websocket-services" anywhere on this page https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/ .

This means that the data & information you are presenting here is not really as reliable as you desire it to be.

I think you are experiencing some genuine problem so its fair to assume that you want to ask about the persistent connection and continued requests, here. But you are not putting together the impact of the factors like ;

in a way that other readers here would need for analysis.

Endpointslice is used to maintain the list of backend pods. So a lot of intricate debugging steps are involved here to answer your question.

Please wait for comments from others also. I have no data to comment on your question

longwuyuan commented 5 days ago

/remove-kind bug /kind support /triage needs-information

ZhengXinwei-F commented 5 days ago

At a glance you did NOT test with a NON-golang HTTP library like ;

  • kubectl create deploy ing test0 --image nginx:alpine --port 80 --replicas 3
  • kubectl expose deploy test0 --port 80
  • kubectl create ing test0 -- class nginx --rule test0.mydomain.com/"*"=test0:80
  • curl test0.mydomain.com --resolve test0.mydomain.com:80:$EXTERNAL_IP

So there is no data about established UN-Terminated connections to a backend.

Secondly, I can't find this annotation "websocket-services" anywhere on this page https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/ .

This means that the data & information you are presenting here is not really as reliable as you desire it to be.

I think you are experiencing some genuine problem so its fair to assume that you want to ask about the persistent connection and continued requests, here. But you are not putting together the impact of the factors like ;

  • the application profile [(the http.get(url) ]
  • the pod termination specs for scaling down or up
  • the timeout configurations
  • the invalid annotation "websocket-services"

in a way that other readers here would need for analysis.

Endpointslice is used to maintain the list of backend pods. So a lot of intricate debugging steps are involved here to answer your question.

Please wait for comments from others also. I have no data to comment on your question

I think your response is hasty. The websocket-services annotation does not affect functionality. From the logs, it is clear that the ingress attempted to access an already deleted pod. I believe this is a bug, not a support issue.

/remove-kind support /kind bug

ZhengXinwei-F commented 5 days ago

What happened: I have a server providing list-watch capabilities exposed to the outside via an ingress-controller. When there are only 1-2 instances of the server, scaling operations cause client requests to fail but they recover quickly. However, when there are three instances, the client (using client-go) does not break its TCP connection with the ingress (this is issue 1). Additionally, the ingress continues to send TCP SYN packets to the already dead pod, resulting in continuous TCP retransmissions. As a result, the client cannot properly access the ingress.

It's important to note that both the connections between the ingress and the client, as well as between the ingress and the server, are HTTP long connections using HTTP chunked transfer to continuously transmit information.

ZhengXinwei-F commented 5 days ago

The core issues are as follows:

When there are 1-2 instances, the ingress does not attempt to access deleted pods. When there are 3 or more instances, the client successfully establishes a TCP connection with the ingress, but the ingress attempts to access deleted pods.

longwuyuan commented 5 days ago

ok sorry. wait for comments from others who can reproduce the problem.

ZhengXinwei-F commented 5 days ago

ok sorry. wait for comments from others who can reproduce the problem.

Thank you again for your prompt response.

longwuyuan commented 5 days ago

I tried to reproduce with image nginx:alpine and I am not able to reproduce that error message so I don't think this can be classified as a bug because there is no reproduce procedure

ZhengXinwei-F commented 4 days ago

I tried to reproduce with image nginx:alpine and I am not able to reproduce that error message so I don't think this can be classified as a bug because there is no reproduce procedure

I apologize for the incomplete reproduction information. I will provide a more detailed reproduction scenario shortly. Currently, I suspect it might be related to endpointSlice informer delays. I am also attempting to read through the ingress-controller code, but it is not a simple task.

longwuyuan commented 4 days ago

this issue can tagged as a bug is invalid because the concern is related to the unmet basic requirements of a bug ;

ZhengXinwei-F commented 4 days ago

this issue can tagged as a bug is invalid because the concern is related to the unmet basic requirements of a bug ;

  • The versions of the controller you posted are not supported anymore simply because there are too many changes to code. How can it be a bug when the version of the controller is not supported
  • Too many users are scaling up & down 24/7/365 but there are hardly any other reports of scaling being broken. How can scaling be broken only for you. The thoughts revolve cause being a impact of your use-case and your config. Until scaling can be seen as broken with a simple test like ;
kubectl create deploy test0 --image nginx:alpine --port 80 --replicas 5
kubectl scale deploy test0 --replicas 1

there is no data for a developer to accept the triage as scaling broken in the controller

I will upgrade the ingress-controller to the latest version to see if this resolves the issue. Additionally, since I am using HTTP chunked transfer over TCP, the scenario is little complex.

longwuyuan commented 4 days ago

I wanted to get to it later, if data showed any relevance. But since you repeat mentioned chunking, I have to make it clear that setting chunking in app code is a big problem.

That is because v1.10.1 of the controller sets chunking on, out of the box, as per latest nginx design. Another user removed the code for setting chunking header, from his app https://github.com/kubernetes/ingress-nginx/issues/11162

But lets no go there aimless. Just try to scale down from 5 to 1, with a vanilla nginx:alpine image, and gather the same same debugging data, from controller v1.10.1 . Do it on a kind cluster or a minikube cluster so others can replicate your test and reproduce the problem you are reporting. Because the controller code will work the same, regardless if its your golang app or a simple curl client.