ingress-nginx controller in a high concurrency environment

hsuchan commented 1 year ago

Background We are in the process of migrating from self deployed Kubernetes clusters (v1.21) on AWS to EKS (v1.24). One of the pre-requisite for this migration is to make sure that ingress-nginx controller is able to sustain high concurrency. For context we have approximately 9000+ Kubernetes services, and are currently using an in-house developed solution using NodePorts to direct external traffic to those services. Since the number of NodePorts is not user configurable on EKS (default range is 30000-32767), we are considering replacing that custom solution with ingress-nginx.

Environment

Kubernetes cluster: v1.21
Cloud provider: AWS
Deployment method: kubespray
ingress-nginx controller version: 1.3.1

How we are configuring ingress-nginx To deal with high concurrency, here are some of the steps we've taken:

Use of dedicated nodes (c6i.8xlarge) for ingress-nginx pods
Each pod is allocated 12CPU/24GB RAM

ingress-nginx-controller configmap:

apiVersion: v1
data:
client-body-timeout: "180"
proxy-body-size: 50m
proxy-read-timeout: "180"
proxy-send-timeout: "180"
proxy-stream-timeout: "180"
server-name-hash-bucket-size: "256"
ssl-ciphers: HIGH:!aNULL:!MD5
use-forwarded-headers: "true"
use-http2: "false"
worker-processes: "auto"
keep-alive-requests: "10000"
upstream-keepalive-connections: "2000"
max-worker-connections: "65536"
kind: ConfigMap
metadata:
name: ingress-nginx-controller
namespace: ingress-nginx

Updated deployment to modify sysctl settings:

  initContainers:
  - name: setsysctl
    image: busybox
    securityContext:
      privileged: true
    command:
    - sh
    - -c
    - |
      sysctl -w net.core.somaxconn=65535
      sysctl -w net.ipv4.ip_local_port_range="1024 65535"
      sysctl -w net.ipv4.tcp_tw_reuse=1

The problem During load tests, the ingress-nginx pods had no issues sustaining high traffic/concurrency. Problems started when we decided to introduce artificial latencies by picking an ingress with significant traffic and rewriting the target to point to an httpbin service that adds an artifical 10 second latency.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/backend-protocol: HTTPS
    nginx.ingress.kubernetes.io/rewrite-target: /delay/10
  name: myservice
  namespace: mynamespace
spec:
  rules:
  - host: myservice.environment.domain
    http:
      paths:
      - backend:
          service:
            name: httpbin
            port:
              number: 8000
        pathType: ImplementationSpecific

What we've observed, is that not only did error rate increase for that specific ingress, but error rate also increased for other ingresses in that cluster. The average response time increased significantly and ingress-nginx pods were in non ready state and restarted in a loop.

How can we configure ingress-nginx so that in a high concurrency situation, latencies experienced by a one or a few services don't cause ingress-nginx to tip over, and impact the rest of the ingresses on that cluster?

k8s-ci-robot commented 1 year ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

longwuyuan commented 1 year ago

Hi @hsuchan ,

Between you configuring custom values in configMap and then saying "error rates increased", there is no other data posted here for someone to analyze and make a comment based on some solid ground. All comments will be based on guess.

One way to approach performance tuning is to first do a default install and run real load at scale with monitoring. Then use the monitoring data to make changes to the config. If you have already done this, then you would have the data that was used to decide all those configMap values.

The other factor is to define and explain concurrency in reverseproxy terms. Number of requests, number of connections, number of sockets on hosts, cpu & memory used etc. etc. This is to know the requests and their payload better.

My above comment is to say that more data is needed to be posted here so that readers can understand the details of increased error rate.

toredash commented 1 year ago

Most likely the issue here is that you don't have enough connections for nginx to handle the amount of concurrency in your environment.

Consider this: you have 65536 max-worker-connections, with a deployment of 1 replica and 12 CPUs (assuming your node have, that amounts to 786432 open connections. In proxy mode, this amounts to half as nginx will maintain on client-connection and one upstream-connection per request. If you send 393216 requests to the httpbin service, nginx is blocked for handling any new requests for 10s.

Have you checked your sysctl settings, are they correct ? Do they map with nginx settings ?

I notice you have set upstream-keepalive-connections to 2000, with 9000 services that amounts to nginx trying to have 18 million upstream connections alive if your able to hit all backends within a relative short time with a lot of requests.

Have you checked your logs when you're having latency issues ?

I would drop all custom configurations, and run a new scale test and alter one and one parameter to see if there is any improvements.

My suggestions would be: worker-processes => set to 12 (c6i.8xlarge have 32 CPUs, all all CPU reserved to nginx?) upstream-keepalive-connections => remove and use default value setsysctl settings => remove all

It is very hard to debug this issue as the full stack is not provided. How are the tests performed, is there an NLB or ALB in front of the nginx pods, how many clients are doing test, what's the average latency before, during and after the httpbin is introduced, does the log say anything ?

hsuchan commented 1 year ago

@longwuyuan and @toredash thanks for the pointers and apologies for the delayed response.

In hindsight, it would have been more appropriate to submit this issue directly to the nginx github project, as what I have described isn't an issue with the ingress-nginx controller, but is a common sense consequence of using nginx in a shared environment, where any bad neighbor can directly impact and deteriorate the performance of the nginx server.

In our environment, we need to coordinate with a dedicated load test team, which has to spin up over a hundred of EC2 nodes to generate enough load to simulate peak traffic patterns, so it's really difficult and time consuming to conduct a trial and error approach consisting of turning one knob, run a load test, observe the result, then rinse and repeat. Often we need to resort to educated guesses to shorten the trial and error process, as running a smaller load test with fewer nodes doesn't always give us the full picture.

In any case, for those experiencing the same type of issues that I have described, our solution was to use the http_limit_conn_module (http://nginx.org/en/docs/http/ngx_http_limit_conn_module.html) which allows to define a maximum number of connections per nginx server. In this manner, any service behind the ingress-nginx controller that is experiencing timeout issues, would exhaust the http_limit_conn defined for this nginx server, and not extend the blast radius. We have implemented this solution, and were able to pass our load test successfully.

Thanks again.

toredash commented 1 year ago

@hsuchan Would you mind providing the new and old configuration? I find in interesting to see what configuration worked for you.

Did you at all test the EWMA load balance (https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#load-balance) configuration ?

Please close the issue if the issue is resolved

hsuchan commented 1 year ago

@toredash The first step was to run a load test and simulate peak traffic to determine the max number of requests each ingress-nginx virtual server/pod receives using prometheus:

sum(rate(nginx_ingress_controller_requests{app_kubernetes_io_instance="ingress-nginx"}[1m])) by (ingress,pod_name)

Took that number, added a % headroom, and added this into our configmap:

  http-snippet: |
    limit_conn_zone $server_name zone=perserver:10m;
  server-snippet: |
    limit_conn perserver 1000;

We didn't modify the load balancing algorithm, but might look into this in the future.

kubernetes / ingress-nginx

ingress-nginx controller in a high concurrency environment #10032