nginxinc / kubernetes-ingress

NGINX and NGINX Plus Ingress Controllers for Kubernetes
https://docs.nginx.com/nginx-ingress-controller
Apache License 2.0
4.67k stars 1.97k forks source link

Benchmarking NGINX Ingress Controller using ab #1275

Closed dakshinai closed 3 years ago

dakshinai commented 3 years ago

Describe the bug We are trying to benchmark NGINX Ingress Controller setup exposed via Node Port on a single controller/worker setup running the "complete-example".

The ab numbers for individual coffee/tea services exposed via ClusterIP are far higher compared to access to them via NGINX Ingress Node Port.

Does ingress controller config require any optimization to get better results?

To Reproduce kubectl get svc --all-namespaces -o wide NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR default coffee-svc ClusterIP 10.106.39.175 80/TCP 15d app=coffee default tea-svc ClusterIP 10.105.165.64 80/TCP 15d app=tea nginx-ingress nginx-ingress NodePort 10.99.39.131 80:30082/TCP,443:32149/TCP 8d app=nginx-ingress

Expected behavior Current ab results from controller

  Requests Concurrency Requests per Second Time Taken (s) Latency Failed Requests Throughput (Kbps) Throughput (Gbps) Requests per Second Time Taken Latency Failed Requests Throughput (Kbps) Throughput (Gbps)
  10000 100 2256.28 4.432 44.32 0 802.04 0.000764885 2220.05 4.504 45.04 0 802.17 0.000765009
  10000 200 2218.88 4.507 22.535 0 788.74 0.000752201 2208.86 4.527 22.635 0 798.12 0.000761147
Current ab results from coffee/tea service   Requests Concurrency Requests per Second Time Taken (s) Latency Failed Requests Throughput (Kbps) Throughput (Gbps) Requests per Second Time Taken (s) Latency Failed Requests Throughput (Kbps) Throughput (Gbps)
  10000 100 17049.02 0.587 5.87 0 6110.34 0.005827274 16575.12 0.603 6.03 0 5989.06 0.005711613
  10000 200 17359.18 0.576 2.88 0 6221.5 0.005933285 17164.02 0.583 2.915 0 6201.84 0.005914536

Your environment

pleshakov commented 3 years ago

Hi @dakshinai

I suggest trying to apply the following performance related optimizations in the ConfigMap:

kind: ConfigMap
apiVersion: v1
metadata:
  name: nginx-config
  namespace: nginx-ingress
data:
  worker-connections: "10000"
  worker-rlimit-nofile: "10240"
  keepalive: "100"
  keepalive-requests: "100000000"

Those are from this blog post https://www.nginx.com/blog/performance-testing-nginx-ingress-controllers-dynamic-kubernetes-cloud-environment/ . Those optimizations are also described here -- https://www.nginx.com/blog/tuning-nginx/

additionally, it makes sense to add worker-cpu-affinity: "auto", so that worker processes are pinned to the core. This is described in the blog post you referenced here https://github.com/nginxinc/kubernetes-ingress/issues/1276

dakshinai commented 3 years ago

This does not affect the numbers yet. The comparison inline.

ab -n 10000 -c 100 -H "Host: cafe.example.com" https://$IC_IP:$IC_HTTPS_PORT/tea This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 143.182.136.163 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests

Server Software: nginx/1.19.3 Server Hostname: 143.182.136.163 Server Port: 31946 SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256 Server Temp Key: X25519 253 bits TLS Server Name: cafe.example.com

Document Path: /tea Document Length: 155 bytes

Concurrency Level: 100 Time taken for tests: 4.403 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 3640000 bytes HTML transferred: 1550000 bytes Requests per second: 2271.41 [#/sec] (mean) Time per request: 44.025 [ms] (mean) Time per request: 0.440 [ms] (mean, across all concurrent requests) Transfer rate: 807.42 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 2 37 21.7 33 110 Processing: 0 7 3.3 6 47 Waiting: 0 4 3.2 3 47 Total: 2 43 24.0 39 130

Percentage of the requests served within a certain time (ms) 50% 39 66% 50 75% 59 80% 65 90% 80 95% 88 98% 92 99% 121 100% 130 (longest request)

ab -n 10000 -c 100 -k http://10.105.165.64/tea This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 10.105.165.64 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests

Server Software: nginx/1.16.1 Server Hostname: 10.105.165.64 Server Port: 80

Document Path: /tea Document Length: 155 bytes

Concurrency Level: 100 Time taken for tests: 0.106 seconds Complete requests: 10000 Failed requests: 0 Keep-Alive requests: 9917 Total transferred: 3689585 bytes HTML transferred: 1550000 bytes Requests per second: 94520.64 [#/sec] (mean) Time per request: 1.058 [ms] (mean) Time per request: 0.011 [ms] (mean, across all concurrent requests) Transfer rate: 34056.83 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.3 0 4 Processing: 0 1 0.3 1 4 Waiting: 0 1 0.3 1 4 Total: 0 1 0.5 1 5

Percentage of the requests served within a certain time (ms) 50% 1 66% 1 75% 1 80% 1 90% 2 95% 2 98% 3 99% 4 100% 5 (longest request)

pleshakov commented 3 years ago

Hi @dakshinai

Because the second test is done without TLS termination, I suggest removing TLS termination from Ingress resource as it affects RPS.

dakshinai commented 3 years ago

ab -n 10000 -c 100 -k -H "Host: cafe.example.com" http://$IC_IP:$IC_HTTP_PORT/tea This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 143.182.136.163 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests

Server Software: nginx/1.19.3 Server Hostname: 143.182.136.163 Server Port: 30804

Document Path: /tea Document Length: 155 bytes

Concurrency Level: 100 Time taken for tests: 0.363 seconds Complete requests: 10000 Failed requests: 0 Keep-Alive requests: 9928 Total transferred: 3689640 bytes HTML transferred: 1550000 bytes Requests per second: 27564.92 [#/sec] (mean) Time per request: 3.628 [ms] (mean) Time per request: 0.036 [ms] (mean, across all concurrent requests) Transfer rate: 9932.09 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.3 0 4 Processing: 0 4 0.3 4 6 Waiting: 0 4 0.3 4 6 Total: 0 4 0.3 4 6

Percentage of the requests served within a certain time (ms) 50% 4 66% 4 75% 4 80% 4 90% 4 95% 4 98% 4 99% 5 100% 6 (longest request)

ab -n 10000 -c 100 -k http://10.105.165.64/tea This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 10.105.165.64 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests

Server Software: nginx/1.16.1 Server Hostname: 10.105.165.64 Server Port: 80

Document Path: /tea Document Length: 155 bytes

Concurrency Level: 100 Time taken for tests: 0.137 seconds Complete requests: 10000 Failed requests: 0 Keep-Alive requests: 9901 Total transferred: 3689505 bytes HTML transferred: 1550000 bytes Requests per second: 72878.33 [#/sec] (mean) Time per request: 1.372 [ms] (mean) Time per request: 0.014 [ms] (mean, across all concurrent requests) Transfer rate: 26258.30 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.3 0 5 Processing: 0 1 0.2 1 5 Waiting: 0 1 0.1 1 5 Total: 0 1 0.4 1 5

Percentage of the requests served within a certain time (ms) 50% 1 66% 1 75% 1 80% 1 90% 1 95% 1 98% 2 99% 4 100% 5 (longest request)

dakshinai commented 3 years ago

cat nginx-config.yaml kind: ConfigMap apiVersion: v1 metadata: name: nginx-config namespace: nginx-ingress data: worker-connections: "10000" worker-rlimit-nofile: "102400" keepalive: "100" keepalive_requests: "100000000" worker-cpu-affinity: "auto"

dakshinai commented 3 years ago

Hi @pleshakov, Removing TLS termination does not help either though without keep alive the numbers are similar. Can you help clarify how keep alive effects here?

pleshakov commented 3 years ago

Hi @dakshinai

Looking at your latest results, looks like the difference has improved, no?

Removing TLS termination does not help either though without keep alive the numbers are similar. Can you help clarify how keep alive effects here?

at higher number of available cores, it is consistent with our performance testing results for NGINX -- https://www.nginx.com/blog/testing-the-performance-of-nginx-and-nginx-plus-web-servers/ (section RPS for HTTPS Requests )

Can you help clarify how keep alive effects here?

Keepalives allow NGINX to reuse connections to backends for subsequent requests. Without keepalives, NGINX will try to establish a new connection for every request.

dakshinai commented 3 years ago

Hi @pleshakov,

  1. The results improved and matched up without client keepalives, but my question was that keepalives drastically improved standalone results, so ingress results do not match up to standalone with keepalives turned on.

  2. We tried another wrk test with the modified config below, if we were to just focus on getting nginx numbers similar to blogs below https://www.nginx.com/blog/testing-the-performance-of-nginx-and-nginx-plus-web-servers/ https://www.nginx.com/blog/testing-performance-nginx-ingress-controller-kubernetes/

kind: ConfigMap apiVersion: v1 metadata: name: nginx-config namespace: nginx-ingress data: worker-processes: "auto" worker-connections: "10000" worker-rlimit-nofile: "102400" worker-cpu-affinity: "auto" keepalive-timeout: "120" keepalive-requests: "10000"

wrk -t 44 -c 1000 -d 180s -H "Host: cafe.example.com" http://$IC_IP:$IC_HTTP_PORT/tea Running 3m test @ http://143.182.136.163:31432/tea 44 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 48.26ms 116.53ms 1.47s 96.73% Req/Sec 0.89k 1.22k 9.72k 89.35% 6978810 requests in 3.00m, 2.40GB read Socket errors: connect 0, read 0, write 0, timeout 44 Requests/sec: 38749.87 Transfer/sec: 13.64MB

This is over a 40GbE line. The blog ingress controller results were 36,647 for 1 CPU and 342,785 for 24 CPU. We have 40 cores on the system running the ingress controller and app but our results only reached 38749.87

  1. Is there a way to check the number of CPU cores used by ingress controller? In our case 1 to 40?
  2. Is there a place where I can download a container image for the NGINX web server used in the test blogs? Current tests work with response sizes ~150bytes but we would like to test for multiple file size responses.
pleshakov commented 3 years ago

Hi @dakshinai

Networking stack could be a bottleneck. I suggest investigating that. You can also tweak the number of worker process in NGINX (ex: worker-processes: "4"), to see when it no longer makes sense to increase the number of processes.

Is there a way to check the number of CPU cores used by ingress controller? In our case 1 to 40?

Perhaps a top command on the node during the test and see which NGINX worker processes are utilized and which ones are not utilized?

Is there a place where I can download a container image for the NGINX web server used in the test blogs? Current tests work with response sizes ~150bytes but we would like to test for multiple file size responses.

The image is just NGINX configured with ConfigMap. Please see https://www.nginx.com/blog/testing-performance-nginx-ingress-controller-kubernetes/ the section Backend DaemonSet Deployment

rawdata123 commented 3 years ago

Hi @dakshinai,

I would suggest starting with 1 CPU or worker process, and testing the performance. Use worker-processes: “1” key in ConfigMap, and test to see if you can get roughly 36 K RPS. The performance should double as you double the cores but only if the CPUs are in the same NUMA node. You will not get linear increase in performance if you just use cores from different NUMA nodes. Additionally there is a container networking bottleneck so the performance may flatten at after 16 cores (in the case of our testing).

I use htop to list the CPU utilization of the system. Also note that we used Flannel as the networking stack in Kubernetes, with the Host-GW enabled. That will increase performance but the nodes in the cluster need to be in the same LAN. More details about this can be found in the blog.

And finally you can get access to the nginx web server image from here: https://hub.docker.com/r/rawdata1234/nginx-webserver-payloads

dakshinai commented 3 years ago

Thanks @pleshakov and @rawdata123

So we did experiment with moving #cores to nginx controller linearly. While the progress was linear the numbers do not match up. The setup is similar to https://www.nginx.com/blog/testing-performance-nginx-ingress-controller-kubernetes/ except for us hosting the ingress controller and nginx pod services on the same node. Also the NGINX pod fetches files from a host volume mounted into NGINX server container. Its a 2 NUMA node system with 20 cores each. So nginx ingress controller could use 20 cores in first NUMA and nginx pod services on second. The numbers below were captured with 20 cores assigned to NGINX pod services and by varying #cores to ingress controller. The only bottleneck we suspect is cross NUMA communication between ingress controller and nginx pod service. htop CPU utilization was consistent with CPU assignment. These are numbers over calico.

CPU cores File Size Duration Concurrency Threads Average Thread Latency (ms) Stdev Thread Latency Max Thread Latency Average Thread RPS Requests per Second Transfer per Second (MB) Total #requests W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 W13 W14 W15 W16 W17 W18 W19 W20
1 1KB 180s 1000 44 104.29 141.34 5780 229 9983.92 12.04 1798097 50.4 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
2 1KB 180s 1000 44 54.19 38.97 1100 434 18882.99 22.76 3400819 50.2 46.5 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
4 1KB 180s 1000 44 26.52 34.71 1090 900 39162.55 47.21 7052924 50.5 47.8 48.4 51.1 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
8 1KB 180s 1000 44 66.38 331.14 3350 1790 75870.94 91.46 13664321 50.6 47.9 48.6 51.2 47.3 50.6 51.2 51.2 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
16 1KB 180s 1000 44 73.52 390.31 5080 2080 88849.59 107.1 16001658 28.6 29.8 29.7 27.7 27.7 29 27.2 30.6 31.1 31 15.8 44.4 31.6 41.5 39.3 27.2 N/A N/A N/A N/A
20 1KB 180s 1000 44 97.78 444.23 4920 2070 86301.41 104.03 15542852 22.3 24.8 23.7 23.4 23.4 24.7 18.9 17.9 16.3 25.8 24.8 22.3 37.2 43.6 26.7 50.6 38.6 60.4 45.9 25.9
  --timeout 10s                                                            

We also switched out calico to use flannel with host-gw. The performance remained similar. For 1 core on nginx ingress controller. We would like your feedback on what else would be missing before we repeat the tests for more CPU cores.

./wrk -t 44 -c 1000 -d 180s -H "Host: host.example.com" http://$IC_IP:$IC_HTTP_PORT/1K Running 3m test @ http://143.182.136.163:31744/1K 44 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 98.64ms 30.60ms 1.99s 95.42% Req/Sec 222.89 49.29 4.15k 95.64% 1736476 requests in 3.00m, 2.04GB read Socket errors: connect 0, read 0, write 0, timeout 518 Requests/sec: 9641.72 Transfer/sec: 11.62MB

Also, if you are interested in numbers from NGINX pod that returns 1kb binary as used in the reference blog. ./wrk -t 44 -c 1000 -d 180s -H "Host: host.example.com" http://$IC_IP:$IC_HTTP_PORT/1kb.bin Running 3m test @ http://143.182.136.163:31744/1kb.bin 44 threads and 1000 connections

Thread Stats Avg Stdev Max +/- Stdev Latency 105.54ms 67.20ms 1.50s 96.55% Req/Sec 221.86 30.25 1.82k 96.98% 1734931 requests in 3.00m, 2.00GB read Socket errors: connect 0, read 0, write 0, timeout 158 Requests/sec: 9633.10 Transfer/sec: 11.39MB

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 3 years ago

This issue was closed because it has been stalled for 7 days with no activity.