kubernetes / ingress-nginx

Ingress-NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.22k stars 8.2k forks source link

under any kind of usage, controller success rate drops < 100% #8842

Closed evanrich closed 2 years ago

evanrich commented 2 years ago

What happened:

I'm not sure if this is a bug or not, but with ingress-nginx deployed, any time I hit a service behind it, the success rate drops below 100%. This is under extremely MILD usage as well, see screenshot: image

If you look at the non 4xx/5xx (so 200's) graph, it drops as low as 85%: image

What you expected to happen: under low load (<2 rps), it should not drop below 100% success rate

Not sure?

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

/ $ ./nginx-ingress-controller version
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.3.0
  Build:         2b7b74854d90ad9b4b96a5011b9e8b67d20bfb8f
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.19.10

Kubernetes version (use kubectl version): 1.24.1

Environment:

 kubectl describe ingressclasses
Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx-ingress-nginx
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.3.0
              helm.sh/chart=ingress-nginx-4.2.0
              helm.toolkit.fluxcd.io/name=ingress-nginx
              helm.toolkit.fluxcd.io/namespace=ingress-nginx
Annotations:  meta.helm.sh/release-name: ingress-nginx-ingress-nginx
              meta.helm.sh/release-namespace: ingress-nginx
Controller:   k8s.io/ingress-nginx
Events:       <none>
NAME                                                          READY   STATUS    RESTARTS   AGE     IP            NODE        NOMINATED NODE   READINESS GATES
pod/ingress-nginx-ingress-nginx-controller-5bcf584cc8-9586c   1/1     Running   0          6d19h   172.17.0.55   homelab-a   <none>           <none>

NAME                                                       TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)                      AGE   SELECTOR
service/ingress-nginx-ingress-nginx-controller             LoadBalancer   10.107.238.76    192.168.50.4   80:32480/TCP,443:32145/TCP   36d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx-ingress-nginx,app.kubernetes.io/name=ingress-nginx
service/ingress-nginx-ingress-nginx-controller-admission   ClusterIP      10.102.206.188   <none>         443/TCP                      36d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx-ingress-nginx,app.kubernetes.io/name=ingress-nginx
service/ingress-nginx-ingress-nginx-controller-metrics     ClusterIP      10.96.219.192    <none>         10254/TCP                    36d   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx-ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME                                                     READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                                                                                                                           SELECTOR
deployment.apps/ingress-nginx-ingress-nginx-controller   1/1     1            1           36d   controller   registry.k8s.io/ingress-nginx/controller-chroot:v1.3.0@sha256:0fcb91216a22aae43b374fc2e6a03b8afe9e8c78cbf07a09d75636dc4ea3c191   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx-ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME                                                                DESIRED   CURRENT   READY   AGE     CONTAINERS   IMAGES                                                                                                                           SELECTOR
replicaset.apps/ingress-nginx-ingress-nginx-controller-5bcf584cc8   1         1         1       6d19h   controller   registry.k8s.io/ingress-nginx/controller-chroot:v1.3.0@sha256:0fcb91216a22aae43b374fc2e6a03b8afe9e8c78cbf07a09d75636dc4ea3c191   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx-ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=5bcf584cc8
replicaset.apps/ingress-nginx-ingress-nginx-controller-7fdb787787   0         0         0       36d     controller   registry.k8s.io/ingress-nginx/controller-chroot:v1.2.1@sha256:d301551cf62bc3fb75c69fa56f7aa1d9e87b5079333adaf38afe84d9b7439355   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx-ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=7fdb787787
 kubectl -n ingress-nginx describe po ingress-nginx-ingress-nginx-controller-5bcf584cc8-9586c
Name:         ingress-nginx-ingress-nginx-controller-5bcf584cc8-9586c
Namespace:    ingress-nginx
Priority:     0
Node:         homelab-a/192.168.4.2
Start Time:   Tue, 12 Jul 2022 20:47:03 -0700
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx-ingress-nginx
              app.kubernetes.io/name=ingress-nginx
              pod-template-hash=5bcf584cc8
Annotations:  <none>
Status:       Running
IP:           172.17.0.55
IPs:
  IP:           172.17.0.55
Controlled By:  ReplicaSet/ingress-nginx-ingress-nginx-controller-5bcf584cc8
Containers:
  controller:
    Container ID:  docker://b2f85c16304f68f3dbf3bea2d7b7f6bb28cbb1d7305e5a4758a6283035f98963
    Image:         registry.k8s.io/ingress-nginx/controller-chroot:v1.3.0@sha256:0fcb91216a22aae43b374fc2e6a03b8afe9e8c78cbf07a09d75636dc4ea3c191
    Image ID:      docker-pullable://registry.k8s.io/ingress-nginx/controller-chroot@sha256:0fcb91216a22aae43b374fc2e6a03b8afe9e8c78cbf07a09d75636dc4ea3c191
    Ports:         80/TCP, 443/TCP, 10254/TCP, 8443/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/ingress-nginx-ingress-nginx-controller
      --election-id=ingress-controller-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/ingress-nginx-ingress-nginx-controller
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
    State:          Running
      Started:      Tue, 12 Jul 2022 20:47:11 -0700
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      100m
      memory:   90Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       ingress-nginx-ingress-nginx-controller-5bcf584cc8-9586c (v1:metadata.name)
      POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lkng8 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-ingress-nginx-admission
    Optional:    false
  kube-api-access-lkng8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>
kubectl describe svc ingress-nginx-ingress-nginx-controller -n ingress-nginx
Name:                     ingress-nginx-ingress-nginx-controller
Namespace:                ingress-nginx
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=ingress-nginx-ingress-nginx
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/part-of=ingress-nginx
                          app.kubernetes.io/version=1.3.0
                          helm.sh/chart=ingress-nginx-4.2.0
                          helm.toolkit.fluxcd.io/name=ingress-nginx
                          helm.toolkit.fluxcd.io/namespace=ingress-nginx
Annotations:              meta.helm.sh/release-name: ingress-nginx-ingress-nginx
                          meta.helm.sh/release-namespace: ingress-nginx
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx-ingress-nginx,app.kubernetes.io/name=ingress-nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.107.238.76
IPs:                      10.107.238.76
LoadBalancer Ingress:     192.168.50.4
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  32480/TCP
Endpoints:                172.17.0.55:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  32145/TCP
Endpoints:                172.17.0.55:443
Session Affinity:         None
External Traffic Policy:  Local
HealthCheck NodePort:     31978
Events:                   <none>

ingress:

kubectl describe ing nextcloud
Name:             nextcloud
Labels:           <none>
Namespace:        default
Address:          192.168.50.4
Ingress Class:    <none>
Default backend:  <default>
TLS:
  nextcloud-certificate-crt terminates host.server.com
Rules:
  Host                                 Path  Backends
  ----                                 ----  --------
  host.server.com:
                                       /   nextcloud:443 (172.17.0.53:443)
Annotations:                           cert-manager.io/cluster-issuer: letsencrypt-prod
                                       kubernetes.io/ingress.class: nginx
                                       nginx.ingress.kubernetes.io/backend-protocol: HTTPS
                                       nginx.ingress.kubernetes.io/cors-allow-credentials: true
                                       nginx.ingress.kubernetes.io/cors-allow-methods: OPTIONS, GET, HEAD, POST, DELETE, TRACE, LOCK, UNLOCK, MOVE, COPY, PROPPATCH, PROPFIND
                                       nginx.ingress.kubernetes.io/enable-cors: true
                                       nginx.ingress.kubernetes.io/proxy-body-size: 16G
                                       nginx.ingress.kubernetes.io/proxy-read-timeout: 3600
                                       nginx.ingress.kubernetes.io/proxy-request-buffering: off
                                       nginx.ingress.kubernetes.io/proxy-send-timeout: 3600
                                       nginx.ingress.kubernetes.io/secure-backends: true
                                       nginx.ingress.kubernetes.io/server-snippet:

                                         # Enable gzip but do not remove ETag headers
                                         gzip on;
                                         gzip_vary on;
                                         gzip_comp_level 6;
                                         gzip_min_length 256;
                                         gzip_proxied expired no-cache no-store private no_last_modified no_etag auth;
                                         gzip_types application/atom+xml application/javascript application/json application/ld+json application/manifest+json application/rss+xml ...

                                         # Pagespeed is not supported by Nextcloud, so if your server is built
                                         # with the `ngx_pagespeed` module, uncomment this line to disable it.
                                         #pagespeed off;

                                         # HTTP response headers borrowed from Nextcloud `.htaccess`
                                         add_header Referrer-Policy                      "no-referrer"   always;

                                         # Remove X-Powered-By, which is an information leak
                                         fastcgi_hide_header X-Powered-By;

                                         server_tokens off;
                                         proxy_hide_header X-Powered-By;

                                         # Rule borrowed from `.htaccess` to handle Microsoft DAV clients
                                         location = / {
                                           if ( $http_user_agent ~ ^DavClnt ) {
                                               return 302 /remote.php/webdav/$is_args$args;
                                           }
                                         }

                                         location = /robots.txt {
                                           allow all;
                                           log_not_found off;
                                           access_log off;
                                         }

                                         # Make a regex exception for `/.well-known` so that clients can still
                                         # access it despite the existence of the regex rule
                                         # `location ~ /(\.|autotest|...)` which would otherwise handle requests
                                         # for `/.well-known`.
                                         location ^~ /.well-known {
                                           # The following 6 rules are borrowed from `.htaccess`

                                           location = /.well-known/carddav     { return 301 /remote.php/dav/; }
                                           location = /.well-known/caldav      { return 301 /remote.php/dav/; }
                                           # Anything else is dynamically handled by Nextcloud
                                           location ^~ /.well-known            { return 301 /index.php$uri; }

                                           try_files $uri $uri/ =404;
                                         }

                                         # Rules borrowed from `.htaccess` to hide certain paths from clients
                                         location ~ ^/(?:build|tests|config|lib|3rdparty|templates|data)(?:$|/)  { return 404; }
                                         location ~ ^/(?:\.|autotest|occ|issue|indie|db_|console)              { return 404; }
Events:                                <none>

How to reproduce this issue:

How to reproduce

install kubernetes install fluxcd deploy helmrelease:

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: nextcloud
  namespace: default
spec:
  interval: 5m
  chart:
    spec:
      chart: nextcloud
      version: '1.8.3'
      sourceRef:
        kind: HelmRepository
        name: nextcloud
        namespace: flux-system
      interval: 1m
  values:
    image:
      tag: php8-version-23.0.4
    ingress:
      enabled: true
      annotations:
        kubernetes.io/ingress.class: "nginx"
        nginx.ingress.kubernetes.io/secure-backends: "true"
        nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
        nginx.ingress.kubernetes.io/proxy-body-size: "16G"
        nginx.ingress.kubernetes.io/enable-cors: "true"
        nginx.ingress.kubernetes.io/cors-allow-credentials: "true"
        nginx.ingress.kubernetes.io/cors-allow-methods: "OPTIONS, GET, HEAD, POST, DELETE, TRACE, LOCK, UNLOCK, MOVE, COPY, PROPPATCH, PROPFIND"
        nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
        nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
        nginx.ingress.kubernetes.io/proxy-request-buffering: "off"
        nginx.ingress.kubernetes.io/server-snippet: |-

          # Enable gzip but do not remove ETag headers
          gzip on;
          gzip_vary on;
          gzip_comp_level 6;
          gzip_min_length 256;
          gzip_proxied expired no-cache no-store private no_last_modified no_etag auth;
          gzip_types application/atom+xml application/javascript application/json application/ld+json application/manifest+json application/rss+xml application/vnd.geo+json application/vnd.ms-fontobject application/x-font-ttf application/x-web-app-manifest+json application/xhtml+xml application/xml font/opentype image/bmp image/svg+xml image/x-icon text/cache-manifest text/css text/plain text/vcard text/vnd.rim.location.xloc text/vtt text/x-component text/x-cross-domain-policy;
          # Pagespeed is not supported by Nextcloud, so if your server is built
          # with the `ngx_pagespeed` module, uncomment this line to disable it.
          #pagespeed off;
          # HTTP response headers borrowed from Nextcloud `.htaccess`
          add_header Referrer-Policy                      "no-referrer"   always;

          # Remove X-Powered-By, which is an information leak
          fastcgi_hide_header X-Powered-By;
          server_tokens off;
          proxy_hide_header X-Powered-By;

          # Rule borrowed from `.htaccess` to handle Microsoft DAV clients
          location = / {
            if ( $http_user_agent ~ ^DavClnt ) {
                return 302 /remote.php/webdav/$is_args$args;
            }
          }
          location = /robots.txt {
            allow all;
            log_not_found off;
            access_log off;
          }
          # Make a regex exception for `/.well-known` so that clients can still
          # access it despite the existence of the regex rule
          # `location ~ /(\.|autotest|...)` which would otherwise handle requests
          # for `/.well-known`.
          location ^~ /.well-known {
            # The following 6 rules are borrowed from `.htaccess`
            location = /.well-known/carddav     { return 301 /remote.php/dav/; }
            location = /.well-known/caldav      { return 301 /remote.php/dav/; }
            # Anything else is dynamically handled by Nextcloud
            location ^~ /.well-known            { return 301 /index.php$uri; }
            try_files $uri $uri/ =404;
          }
          # Rules borrowed from `.htaccess` to hide certain paths from clients
          location ~ ^/(?:build|tests|config|lib|3rdparty|templates|data)(?:$|/)  { return 404; }
          location ~ ^/(?:\.|autotest|occ|issue|indie|db_|console)              { return 404; }

        cert-manager.io/cluster-issuer: letsencrypt-prod
        tls:
          - secretName: nextcloud-certificate-crt
            hosts:
              - host.server.com
    nextcloud:
      host: host.server.com
      username: admin
      password: changeme
      existingSecret:
        enabled: true
        secretName: secretname
        usernameKey: username
        passwordKey: password
        tokenKey: token
        smtpUsernameKey: smtp_username
        smtpPasswordKey: smtp_password
      phpConfigs:
      defaultConfigs:
        apache-pretty-urls.config.php: false
        ### finish from here
    externalDatabase:
      enabled: true
      type: postgresql
      host:
    metrics:
      enabled: true

    extraVolumes:
      - name: dockerdata
        hostPath:
        # directory location on host

wait for nextcloud to deploy, with ingress, then try hitting endpoint. It doesn't have to be nextcloud, any service I have behind nginx will cause successrate to drop below 100%. Services don't seemm to be impacted either, but the <100% bothers me. deploy ingress

k8s-ci-robot commented 2 years ago

@evanrich: This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
evanrich commented 2 years ago

nvm

longwuyuan commented 2 years ago

/remove-kind bug