kubernetes / ingress-nginx

Ingress-NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.31k stars 8.22k forks source link

Ingress controller processing ingresses with different ingressClassName when multiple ingress controllers are deployed in the same namespace - AWS EKS 1.27 #10907

Open mdellavedova opened 8 months ago

mdellavedova commented 8 months ago

What happened:

I have 2 ingress controllers deployed in the same namespace, set up following the instructions in these documents: https://kubernetes.github.io/ingress-nginx/user-guide/k8s-122-migration/#i-cant-use-multiple-namespaces-what-should-i-do and https://kubernetes.github.io/ingress-nginx/user-guide/multiple-ingress/#multiple-ingress-controllers The ingresses work as expected but when I look at the logs for one ingress controller I can see multiple errors:

I0123 10:02:35.684672       7 store.go:436] "Ignoring ingress because of error while validating ingress class" ingress="omega/cs-05c36933-076c-490f-a23b-d6d5019d1cb2-api-gw" error="no object matching key \"ingress-controller-internal-nginx\" in local store"

suggesting that the ingress controller is considering ingresses that belong to the other ingress controller and vice-versa. This creates a high load on (one of) the ingress controller's pods causing it to restart. What you expected to happen: I would expect both ingress controllers to ignore ingresses which don't have their associated ingressClassName

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):


NGINX Ingress controller Release: v1.8.1 Build: dc88dce9ea5e700f3301d16f971fa17c6cfe757d Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.21.6


I have also tried the latest available helm chart, which didn't help

NGINX Ingress controller Release: v1.9.5 Build: f503c4bb5fa7d857ad29e94970eb550c2bc00b7c Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.21.6


Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:20:54Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.8-eks-8cb36c9", GitCommit:"fca3a8722c88c4dba573a903712a6feaf3c40a51", GitTreeState:"clean", BuildDate:"2023-11-22T21:52:13Z", GoVersion:"go1.20.11", Compiler:"gc", Platform:"linux/amd64"}

Environment:

NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
SUPPORT_END="2025-06-30"
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:20:54Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.8-eks-8cb36c9", GitCommit:"fca3a8722c88c4dba573a903712a6feaf3c40a51", GitTreeState:"clean", BuildDate:"2023-11-22T21:52:13Z", GoVersion:"go1.20.11", Compiler:"gc", Platform:"linux/amd64"}
        controller:
          resources:
            requests:
              cpu: 100m
              memory: 500Mi
            limits:
              cpu: 2
              memory: 2000Mi
          hostNetwork: true
          ingressClass: ingress-controller-internal-nginx

          ingressClassResource:
            controllerValue: "k8s.io/ingress-nginx-internal"
            name: ingress-controller-internal-nginx
          electionID: "ingress-controller-internal-leader"
          {{- if .Values.ingressControllerInternal.metrics.enabled }}
          metrics:
            enabled: true
            service:
              annotations:
                prometheus.io/port: "10254"
                prometheus.io/scrape: "true"
          {{- end }}
          service:
            targetPorts:
              http: http
              https: http
            annotations:
              nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
              service.beta.kubernetes.io/aws-load-balancer-type: nlb
              service.beta.kubernetes.io/aws-load-balancer-internal: "true"
              service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
              service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "60"
              service.beta.kubernetes.io/aws-load-balancer-backend-protocol: TLS
              service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "https"
              service.beta.kubernetes.io/aws-load-balancer-ssl-cert: {{ .Values.ingressControllerInternal.acm_arn }}
              service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: "Monitoring=enabled"
              # it doesn't work, aws-load-balancer-type must be changed to "external"
              # service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: "preserve_client_ip.enabled=false"
          podAnnotations:
            co.elastic.logs/processors.0.decode_json_fields.fields: message
            co.elastic.logs/processors.0.decode_json_fields.target: lb
          config:
            log-format-escape-json: true
            log-format-upstream: '{"@timestamp":"$msec", "date":"$time_iso8601", "upstreamIp":"$realip_remote_addr", "traceId": "$http_x_nexmo_trace_id",
              "clientIpAddress":"$remote_addr", "xForwardedFor":"$http_x_forwarded_for", "hdrContentType":"$http_content_type",
              "hdrSentContentType": "$sent_http_content_type", "remoteUser": "$remote_user", "uri": "$request_uri", 
              "method":"$request_method","serverProto":"$server_protocol", "httpStatus":"$status",
              "reqTime":"$request_time", "reqLength":"$request_length", "size":"$body_bytes_sent",
              "referer":"$http_referer", "userAgent":"$http_user_agent", "upsAddr":"$upstream_addr",
              "upsStatus":"$upstream_status",  "upsConnectTime":"$upstream_connect_time", "upsHeaderTime":"$upstream_header_time",  "upsResponseTime":"$upstream_response_time",
              "upsStatus_all":"$upstream_status",  "upsConnectTime_all":"$upstream_connect_time",
              "upsHeaderTime_all":"$upstream_header_time",  "upsResponseTime_all":"$upstream_response_time",
              "hostname":"$host",  "serverPort":"$server_port",  "scheme":"$scheme", "sslCipher":"$ssl_cipher",
              "sslProtocol":"$ssl_protocol"}'
            http-snippet: >-
              log_format bodyinfo escape=json '{"@timestamp":"$msec", "date":"$time_iso8601", "upstreamIp":"$realip_remote_addr", "traceId": "$http_x_nexmo_trace_id",
              "clientIpAddress":"$remote_addr", "xForwardedFor":"$http_x_forwarded_for", "hdrContentType":"$http_content_type",
              "hdrSentContentType": "$sent_http_content_type", "remoteUser": "$remote_user", "uri": "$request_uri", 
              "method":"$request_method","serverProto":"$server_protocol", "httpStatus":"$status",
              "reqTime":"$request_time", "reqLength":"$request_length", "size":"$body_bytes_sent",
              "referer":"$http_referer", "userAgent":"$http_user_agent", "upsAddr":"$upstream_addr",
              "upsStatus":"$upstream_status",  "upsConnectTime":"$upstream_connect_time", "upsHeaderTime":"$upstream_header_time",  "upsResponseTime":"$upstream_response_time",
              "upsStatus_all":"$upstream_status",  "upsConnectTime_all":"$upstream_connect_time",
              "upsHeaderTime_all":"$upstream_header_time",  "upsResponseTime_all":"$upstream_response_time",
              "hostname":"$host",  "serverPort":"$server_port",  "scheme":"$scheme", "sslCipher":"$ssl_cipher",
              "sslProtocol":"$ssl_protocol","requestBody":"[$request_body]"}';
          admissionWebhooks:
            timeoutSeconds: 30
          replicaCount: {{ .Values.ingressControllerInternal.replicaCount }}
          minAvailable: {{ max 1 ( sub .Values.ingressControllerInternal.replicaCount 1 ) }}
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                  - key: app.kubernetes.io/name
                    operator: In
                    values:
                    - ingress-nginx
                  - key: app.kubernetes.io/instance
                    operator: In
                    values:
                    - ingress-nginx-internal
                  - key: app.kubernetes.io/component
                    operator: In
                    values:
                    - controller
                topologyKey: "kubernetes.io/hostname"
          topologySpreadConstraints:
            - maxSkew: 1
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: ScheduleAnyway
              labelSelector:
                matchLabels:
                  app.kubernetes.io/instance: ingress-nginx-internal
Name:         ingress-controller-internal-nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx-internal
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.8.1
              argocd.argoproj.io/instance=ingress-nginx-internal-euw1-1
              helm.sh/chart=ingress-nginx-4.7.1
Annotations:  <none>
Controller:   k8s.io/ingress-nginx-internal
Events:       <none>

Name:         nginx-public-nlb-tls
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx-public-nlb-tls
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.9.5
              argocd.argoproj.io/instance=ingress-nginx-public-nlb-tls-euw1-1
              helm.sh/chart=ingress-nginx-4.9.0
Annotations:  <none>
Controller:   k8s.io/ingress-nginx-public-nlb-tls
Events:       <none>
Name:             ingress-nginx-public-nlb-tls-controller-6fbb668d64-prgkp
Namespace:        cluster
Priority:         0
Service Account:  ingress-nginx-public-nlb-tls
Node:             ip-10-229-145-39.eu-west-1.compute.internal/10.229.145.39
Start Time:       Tue, 23 Jan 2024 10:02:33 +0000
Labels:           app.kubernetes.io/component=controller
                  app.kubernetes.io/instance=ingress-nginx-public-nlb-tls
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=ingress-nginx
                  app.kubernetes.io/part-of=ingress-nginx
                  app.kubernetes.io/version=1.9.5
                  helm.sh/chart=ingress-nginx-4.9.0
                  pod-template-hash=6fbb668d64
Annotations:      co.elastic.logs/processors.0.decode_json_fields.fields: message
                  co.elastic.logs/processors.0.decode_json_fields.target: lb
                  kubectl.kubernetes.io/restartedAt: 2024-01-23T10:02:32Z
Status:           Running
IP:               10.229.145.39
IPs:
  IP:           10.229.145.39
Controlled By:  ReplicaSet/ingress-nginx-public-nlb-tls-controller-6fbb668d64
Containers:
  controller:
    Container ID:    containerd://3c0d0d081c8986c9bea84aa03e8c944848f35c415aca7d9d3e7dbc046eb3b346
    Image:           registry.k8s.io/ingress-nginx/controller:v1.9.5@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e
    Image ID:        registry.k8s.io/ingress-nginx/controller@sha256:b3aba22b1da80e7acfc52b115cae1d4c687172cbf2b742d5b502419c25ff340e
    Ports:           80/TCP, 443/TCP, 8443/TCP
    Host Ports:      80/TCP, 443/TCP, 8443/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/ingress-nginx-public-nlb-tls-controller
      --election-id=ingress-nginx-public-nlb-tls-leader
      --controller-class=k8s.io/ingress-nginx-public-nlb-tls
      --ingress-class=nginx-public-nlb-tls
      --configmap=$(POD_NAMESPACE)/ingress-nginx-public-nlb-tls-controller
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
      --ingress-class-by-name=true
    State:          Running
      Started:      Tue, 23 Jan 2024 10:02:34 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  2000Mi
    Requests:
      cpu:      100m
      memory:   500Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       ingress-nginx-public-nlb-tls-controller-6fbb668d64-prgkp (v1:metadata.name)
      POD_NAMESPACE:  cluster (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h9dd4 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-public-nlb-tls-admission
    Optional:    false
  kube-api-access-h9dd4:
    Type:                     Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:   3607
    ConfigMapName:            kube-root-ca.crt
    ConfigMapOptional:        <nil>
    DownwardAPI:              true
QoS Class:                    Burstable
Node-Selectors:               kubernetes.io/os=linux
Tolerations:                  node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector app.kubernetes.io/instance=ingress-nginx-public-nlb-tls
Events:                       <none>
  Name:                     ingress-nginx-public-nlb-tls-controller
Namespace:                cluster
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=ingress-nginx-public-nlb-tls
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/part-of=ingress-nginx
                          app.kubernetes.io/version=1.9.5
                          argocd.argoproj.io/instance=ingress-nginx-public-nlb-tls-euw1-1
                          helm.sh/chart=ingress-nginx-4.9.0
Annotations:              nginx.ingress.kubernetes.io/force-ssl-redirect: true
                          service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: Monitoring=enabled
                          service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: 60
                          service.beta.kubernetes.io/aws-load-balancer-type: nlb
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx-public-nlb-tls,app.kubernetes.io/name=ingress-nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       172.20.60.210
IPs:                      172.20.60.210
LoadBalancer Ingress:     <redacted>
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  30881/TCP
Endpoints:                10.229.145.39:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  31447/TCP
Endpoints:                10.229.145.39:443
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>
  Name:             neru-59e69cd7-go-neru-queue-scheduler-dev-com
Labels:           <none>
Namespace:        omega
Address:          <redacted>.elb.eu-west-1.amazonaws.com
Ingress Class:    nginx-public-nlb-tls
Default backend:  <default>
TLS:
  default-ingress-ssl terminates <redacted>
Rules:
  Host                                                                       Path  Backends
  ----                                                                       ----  --------
  <redacted>
                                                                             /   envoy:80 (172.16.90.147:5000)
Annotations:                                                                 nginx.ingress.kubernetes.io/backend-protocol: HTTP
                                                                             nginx.ingress.kubernetes.io/upstream-vhost: <redacted>
Events:                                                                      <none>

How to reproduce this issue:

Anything else we need to know:

k8s-ci-robot commented 8 months ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
longwuyuan commented 8 months ago

/remove-kind bug

longwuyuan commented 8 months ago

/triage needs-information

mdellavedova commented 8 months ago

Sorry I posted by mistake before completing the form, please let me know if there's anything else I need to add

mdellavedova commented 8 months ago

Hi, I can see the triage/needs-information label is still there after I updated the form last week, could you please let me know if there's anything missing?

longwuyuan commented 8 months ago
mdellavedova commented 7 months ago

Thanks for your reply

  • "Ignoring ingress" does not indicate that the ingress rules were used for routing

I'm sure the rules aren't used for routing although I have a large number of ingresses that get pointlessly evaluated causing an increase in load for 1 of the 3 pods in the deployment which lead to restarts. (every time there is a batch of "Ignoring ingress" errors in the logs one of the pod restarts)

  • The most important aspect here is to confirm that you installed as per the link i pasted here earlier

I have followed that guide and double checked the configuration multiple times

  • The proof needed is that appropriate controller instance processes appropriate ingress rule routing

that's confirmed, the 2 ingress controllers only process their own ingress rules, the issue is the "Ignoring ingress" errors and the associated pod restarts

longwuyuan commented 7 months ago
mdellavedova commented 7 months ago

thanks for your effort, I believe the restarts are due to the number of ingress resources being evaluated, I have a similar setup in 3 separate regions:

region 1: total number of ingresses managed by both controllers: 1962 restarts controller 1: 33 over 20 days (1 of 3 pods only) restarts controller 2: 0 over 19 days

region 2: total number of ingresses managed by both controllers: 426 restarts controller 1: 0 over 20 days restarts controller 2: 123 over 19 days (1 of 3 pods only)

region 3: total number of ingresses managed by both controllers: 192 restarts controller 1: 0 over 20 days restarts controller 2: 0 over 19 days (1 of 3 pods only)

could you please re-run your test with a higher number of ingresses? I'm not sure why there is no correlation between the number of ingress resources and the number of restarts, I will try and look at the traffic in region 2 vs region 1

longwuyuan commented 7 months ago
github-actions[bot] commented 6 months ago

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.