kubernetes / ingress-nginx

Ingress-NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
16.96k stars 8.14k forks source link

Retry loop on keep-alive connection to upstream servers #9332

Open zedge-it opened 1 year ago

zedge-it commented 1 year ago

What happened:

Ingress-controller is handling a bad request from a client, gets an empty reply from an upstream server, and ends up in a retry loop until the client disconnects.

This only happened when the ingress-controller had open keep-alive connections to the upstream servers.

What you expected to happen:

The ingress-controller should do 3 (proxy-next-upstream-tries) retries and then return a 502 to the client.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

NGINX Ingress controller
  Release:       v1.5.1
  Build:         d003aae913cc25f375deb74f898c7f3c65c06f05
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

Kubernetes version (use kubectl version):

Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.12-gke.1600", GitCommit:"5b5d8c9b3bf9c7e4b50c276f4d165d176e310dfe", GitTreeState:"clean", BuildDate:"2022-10-13T09:30:22Z", GoVersion:"go1.17.13b7", Compiler:"gc", Platform:"linux/amd64"}

How to reproduce this issue:

  1. Use minikube

    minikube start
  2. Install ingress-nginx

    helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
    helm upgrade -i ingress-nginx ingress-nginx/ingress-nginx --version 4.4.0 -f values-ingress-nginx.yaml
    • values-ingress-nginx.yaml:
      controller:
      config:
      enable-vts-status: "true"
      vts-status-zone-size: "32m"
      hsts: "false"
      hsts-include-subdomains: "false"
      use-geoip: "false"
      use-geoip2: "false"
      use-forwarded-headers: "true"
      map-hash-bucket-size: "128"
      use-gzip: "true"
      electionID: nginx-ingress-controller-leader
      ingressClass: nginx
      publishService:
      enabled: true
      podAnnotations:
      cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
      service:
      enabled: true
      type: NodePort
      externalTrafficPolicy: Local
      revisionHistoryLimit: 5
  3. Install upstream service

    helm repo add bitnami https://charts.bitnami.com/bitnami
    helm upgrade -i nginx-empty-replyserver bitnami/nginx --version 13.2.13 -f values-nginx-empty-replyserver.yaml
    • values-nginx-empty-replyserver.yaml:
      
      replicaCount: 3
      service:
      type: ClusterIP
      ingress:
      enabled: true
      hostname: foo.bar
      annotations:
      kubernetes.io/ingress.class: nginx
      serverBlock: |-
      server {
      listen  8080;

    location /die { return 444; }

    location /stub_status { stub_status on; } }

    
    This service has an endpoint (/) that returns 200 OK, and an endpoint (/die) that returns an empty reply.
  4. Create minikube service tunnel

    minikube service ingress-nginx-controller
    |-----------|--------------------------|-------------|------------------------|
    | NAMESPACE |           NAME           | TARGET PORT |          URL           |
    |-----------|--------------------------|-------------|------------------------|
    | default   | ingress-nginx-controller |             | http://127.0.0.1:53482 |
    |           |                          |             | http://127.0.0.1:53483 |
    |-----------|--------------------------|-------------|------------------------|
  5. Warm up the keep-alive connections with valid 200 OK requests

    wrk -c 5 -d 5 -t 2 -H 'Host: foo.bar' http://127.0.0.1:53482/
  6. Before the idle keep-alive connections are closed (60s), send a request that returns an empty reply

    curl -H 'Host: foo.bar' http://127.0.0.1:53482/die

    This will hang until you stop it.

Anything else we need to know:

If you disable keep-alive connections in the ingress-controller config, it will retry 3 times and return "502 Bad Gateway" as expected

upstream-keepalive-requests: 0
upstream-keepalive-timeout: 0
k8s-ci-robot commented 1 year ago

@zedge-it: This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
strongjz commented 1 year ago

/assign @strongjz

Can you post the ingress object as well?

If ingress-nginx gets a request for a service it doesnt have it will send it to the default backend.

strongjz commented 1 year ago

/triage needs-information

zedge-it commented 1 year ago
$ kubectl get ingress nginx-empty-replyserver -o yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    meta.helm.sh/release-name: nginx-empty-replyserver
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2022-11-22T11:49:13Z"
  generation: 2
  labels:
    app.kubernetes.io/instance: nginx-empty-replyserver
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: nginx
    helm.sh/chart: nginx-13.2.13
  name: nginx-empty-replyserver
  namespace: default
  resourceVersion: "11762"
  uid: e0ca2334-82f7-4d99-968a-6098006cc094
spec:
  rules:
  - host: foo.bar
    http:
      paths:
      - backend:
          service:
            name: nginx-empty-replyserver
            port:
              name: http
        path: /
        pathType: ImplementationSpecific
status:
  loadBalancer:
    ingress:
    - ip: 10.98.231.89
roorkrn commented 7 months ago

you can disable the multiple retry (default is 3) connection to upstream with below annotation/configuration

nginx.ingress.kubernetes.io/proxy-next-upstream-tries: '1'

zengyuxing007 commented 2 months ago

I met the same problem and would like to know what is the root cause of this? @strongjz

thanks~