Benchmarking NGINX Ingress Controller using ab

dakshinai commented 3 years ago

Describe the bug We are trying to benchmark NGINX Ingress Controller setup exposed via Node Port on a single controller/worker setup running the "complete-example".

The ab numbers for individual coffee/tea services exposed via ClusterIP are far higher compared to access to them via NGINX Ingress Node Port.

Does ingress controller config require any optimization to get better results?

To Reproduce kubectl get svc --all-namespaces -o wide NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR default coffee-svc ClusterIP 10.106.39.175 80/TCP 15d app=coffee default tea-svc ClusterIP 10.105.165.64 80/TCP 15d app=tea nginx-ingress nginx-ingress NodePort 10.99.39.131 80:30082/TCP,443:32149/TCP 8d app=nginx-ingress

Expected behavior Current ab results from controller

	Requests	Concurrency	Requests per Second	Time Taken (s)	Latency	Failed Requests	Throughput (Kbps)	Throughput (Gbps)	Requests per Second	Time Taken	Latency	Failed Requests	Throughput (Kbps)	Throughput (Gbps)
	10000	100	2256.28	4.432	44.32	0	802.04	0.000764885	2220.05	4.504	45.04	0	802.17	0.000765009
	10000	200	2218.88	4.507	22.535	0	788.74	0.000752201	2208.86	4.527	22.635	0	798.12	0.000761147

Current ab results from coffee/tea service	Requests	Concurrency	Requests per Second	Time Taken (s)	Latency	Failed Requests	Throughput (Kbps)	Throughput (Gbps)	Requests per Second	Time Taken (s)	Latency	Failed Requests	Throughput (Kbps)	Throughput (Gbps)
	10000	100	17049.02	0.587	5.87	0	6110.34	0.005827274	16575.12	0.603	6.03	0	5989.06	0.005711613
	10000	200	17359.18	0.576	2.88	0	6221.5	0.005933285	17164.02	0.583	2.915	0	6201.84	0.005914536

Your environment

Version of the Ingress Controller - v1.9.1
Version of Kubernetes - v1.19.3
Using NGINX or NGINX Plus - NGINX
OS - Fedora 30, Kernel 5.2.10
CPU - 40 cores Thread(s) per core: 1 Core(s) per socket: 20 Socket(s): 2 NUMA node(s): 2

pleshakov commented 3 years ago

Hi @dakshinai

I suggest trying to apply the following performance related optimizations in the ConfigMap:

kind: ConfigMap
apiVersion: v1
metadata:
  name: nginx-config
  namespace: nginx-ingress
data:
  worker-connections: "10000"
  worker-rlimit-nofile: "10240"
  keepalive: "100"
  keepalive-requests: "100000000"

Those are from this blog post https://www.nginx.com/blog/performance-testing-nginx-ingress-controllers-dynamic-kubernetes-cloud-environment/ . Those optimizations are also described here -- https://www.nginx.com/blog/tuning-nginx/

additionally, it makes sense to add worker-cpu-affinity: "auto", so that worker processes are pinned to the core. This is described in the blog post you referenced here https://github.com/nginxinc/kubernetes-ingress/issues/1276

dakshinai commented 3 years ago

This does not affect the numbers yet. The comparison inline.

ab -n 10000 -c 100 -H "Host: cafe.example.com" https://$IC_IP:$IC_HTTPS_PORT/tea This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 143.182.136.163 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests

Server Software: nginx/1.19.3 Server Hostname: 143.182.136.163 Server Port: 31946 SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256 Server Temp Key: X25519 253 bits TLS Server Name: cafe.example.com

Document Path: /tea Document Length: 155 bytes

Concurrency Level: 100 Time taken for tests: 4.403 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 3640000 bytes HTML transferred: 1550000 bytes Requests per second: 2271.41 [#/sec] (mean) Time per request: 44.025 [ms] (mean) Time per request: 0.440 [ms] (mean, across all concurrent requests) Transfer rate: 807.42 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 2 37 21.7 33 110 Processing: 0 7 3.3 6 47 Waiting: 0 4 3.2 3 47 Total: 2 43 24.0 39 130

Percentage of the requests served within a certain time (ms) 50% 39 66% 50 75% 59 80% 65 90% 80 95% 88 98% 92 99% 121 100% 130 (longest request)

ab -n 10000 -c 100 -k http://10.105.165.64/tea This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 10.105.165.64 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests

Server Software: nginx/1.16.1 Server Hostname: 10.105.165.64 Server Port: 80

Document Path: /tea Document Length: 155 bytes

Concurrency Level: 100 Time taken for tests: 0.106 seconds Complete requests: 10000 Failed requests: 0 Keep-Alive requests: 9917 Total transferred: 3689585 bytes HTML transferred: 1550000 bytes Requests per second: 94520.64 [#/sec] (mean) Time per request: 1.058 [ms] (mean) Time per request: 0.011 [ms] (mean, across all concurrent requests) Transfer rate: 34056.83 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.3 0 4 Processing: 0 1 0.3 1 4 Waiting: 0 1 0.3 1 4 Total: 0 1 0.5 1 5

Percentage of the requests served within a certain time (ms) 50% 1 66% 1 75% 1 80% 1 90% 2 95% 2 98% 3 99% 4 100% 5 (longest request)

pleshakov commented 3 years ago

Hi @dakshinai

Because the second test is done without TLS termination, I suggest removing TLS termination from Ingress resource as it affects RPS.

dakshinai commented 3 years ago

ab -n 10000 -c 100 -k -H "Host: cafe.example.com" http://$IC_IP:$IC_HTTP_PORT/tea This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 143.182.136.163 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests

Server Software: nginx/1.19.3 Server Hostname: 143.182.136.163 Server Port: 30804

Document Path: /tea Document Length: 155 bytes

Concurrency Level: 100 Time taken for tests: 0.363 seconds Complete requests: 10000 Failed requests: 0 Keep-Alive requests: 9928 Total transferred: 3689640 bytes HTML transferred: 1550000 bytes Requests per second: 27564.92 [#/sec] (mean) Time per request: 3.628 [ms] (mean) Time per request: 0.036 [ms] (mean, across all concurrent requests) Transfer rate: 9932.09 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.3 0 4 Processing: 0 4 0.3 4 6 Waiting: 0 4 0.3 4 6 Total: 0 4 0.3 4 6

Percentage of the requests served within a certain time (ms) 50% 4 66% 4 75% 4 80% 4 90% 4 95% 4 98% 4 99% 5 100% 6 (longest request)

ab -n 10000 -c 100 -k http://10.105.165.64/tea This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 10.105.165.64 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests

Server Software: nginx/1.16.1 Server Hostname: 10.105.165.64 Server Port: 80

Document Path: /tea Document Length: 155 bytes

Concurrency Level: 100 Time taken for tests: 0.137 seconds Complete requests: 10000 Failed requests: 0 Keep-Alive requests: 9901 Total transferred: 3689505 bytes HTML transferred: 1550000 bytes Requests per second: 72878.33 [#/sec] (mean) Time per request: 1.372 [ms] (mean) Time per request: 0.014 [ms] (mean, across all concurrent requests) Transfer rate: 26258.30 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.3 0 5 Processing: 0 1 0.2 1 5 Waiting: 0 1 0.1 1 5 Total: 0 1 0.4 1 5

Percentage of the requests served within a certain time (ms) 50% 1 66% 1 75% 1 80% 1 90% 1 95% 1 98% 2 99% 4 100% 5 (longest request)

dakshinai commented 3 years ago

cat nginx-config.yaml kind: ConfigMap apiVersion: v1 metadata: name: nginx-config namespace: nginx-ingress data: worker-connections: "10000" worker-rlimit-nofile: "102400" keepalive: "100" keepalive_requests: "100000000" worker-cpu-affinity: "auto"

dakshinai commented 3 years ago

Hi @pleshakov, Removing TLS termination does not help either though without keep alive the numbers are similar. Can you help clarify how keep alive effects here?

pleshakov commented 3 years ago

Hi @dakshinai

Looking at your latest results, looks like the difference has improved, no?

Removing TLS termination does not help either though without keep alive the numbers are similar. Can you help clarify how keep alive effects here?

at higher number of available cores, it is consistent with our performance testing results for NGINX -- https://www.nginx.com/blog/testing-the-performance-of-nginx-and-nginx-plus-web-servers/ (section RPS for HTTPS Requests )

Can you help clarify how keep alive effects here?

Keepalives allow NGINX to reuse connections to backends for subsequent requests. Without keepalives, NGINX will try to establish a new connection for every request.

dakshinai commented 3 years ago

Hi @pleshakov,

The results improved and matched up without client keepalives, but my question was that keepalives drastically improved standalone results, so ingress results do not match up to standalone with keepalives turned on.
We tried another wrk test with the modified config below, if we were to just focus on getting nginx numbers similar to blogs below https://www.nginx.com/blog/testing-the-performance-of-nginx-and-nginx-plus-web-servers/ https://www.nginx.com/blog/testing-performance-nginx-ingress-controller-kubernetes/

kind: ConfigMap apiVersion: v1 metadata: name: nginx-config namespace: nginx-ingress data: worker-processes: "auto" worker-connections: "10000" worker-rlimit-nofile: "102400" worker-cpu-affinity: "auto" keepalive-timeout: "120" keepalive-requests: "10000"

wrk -t 44 -c 1000 -d 180s -H "Host: cafe.example.com" http://$IC_IP:$IC_HTTP_PORT/tea Running 3m test @ http://143.182.136.163:31432/tea 44 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 48.26ms 116.53ms 1.47s 96.73% Req/Sec 0.89k 1.22k 9.72k 89.35% 6978810 requests in 3.00m, 2.40GB read Socket errors: connect 0, read 0, write 0, timeout 44 Requests/sec: 38749.87 Transfer/sec: 13.64MB

This is over a 40GbE line. The blog ingress controller results were 36,647 for 1 CPU and 342,785 for 24 CPU. We have 40 cores on the system running the ingress controller and app but our results only reached 38749.87

Is there a way to check the number of CPU cores used by ingress controller? In our case 1 to 40?
Is there a place where I can download a container image for the NGINX web server used in the test blogs? Current tests work with response sizes ~150bytes but we would like to test for multiple file size responses.

pleshakov commented 3 years ago

Hi @dakshinai

Networking stack could be a bottleneck. I suggest investigating that. You can also tweak the number of worker process in NGINX (ex: worker-processes: "4"), to see when it no longer makes sense to increase the number of processes.

Is there a way to check the number of CPU cores used by ingress controller? In our case 1 to 40?

Perhaps a top command on the node during the test and see which NGINX worker processes are utilized and which ones are not utilized?

Is there a place where I can download a container image for the NGINX web server used in the test blogs? Current tests work with response sizes ~150bytes but we would like to test for multiple file size responses.

The image is just NGINX configured with ConfigMap. Please see https://www.nginx.com/blog/testing-performance-nginx-ingress-controller-kubernetes/ the section Backend DaemonSet Deployment

rawdata123 commented 3 years ago

Hi @dakshinai,

I would suggest starting with 1 CPU or worker process, and testing the performance. Use worker-processes: “1” key in ConfigMap, and test to see if you can get roughly 36 K RPS. The performance should double as you double the cores but only if the CPUs are in the same NUMA node. You will not get linear increase in performance if you just use cores from different NUMA nodes. Additionally there is a container networking bottleneck so the performance may flatten at after 16 cores (in the case of our testing).

I use htop to list the CPU utilization of the system. Also note that we used Flannel as the networking stack in Kubernetes, with the Host-GW enabled. That will increase performance but the nodes in the cluster need to be in the same LAN. More details about this can be found in the blog.

And finally you can get access to the nginx web server image from here: https://hub.docker.com/r/rawdata1234/nginx-webserver-payloads

dakshinai commented 3 years ago

Thanks @pleshakov and @rawdata123

So we did experiment with moving #cores to nginx controller linearly. While the progress was linear the numbers do not match up. The setup is similar to https://www.nginx.com/blog/testing-performance-nginx-ingress-controller-kubernetes/ except for us hosting the ingress controller and nginx pod services on the same node. Also the NGINX pod fetches files from a host volume mounted into NGINX server container. Its a 2 NUMA node system with 20 cores each. So nginx ingress controller could use 20 cores in first NUMA and nginx pod services on second. The numbers below were captured with 20 cores assigned to NGINX pod services and by varying #cores to ingress controller. The only bottleneck we suspect is cross NUMA communication between ingress controller and nginx pod service. htop CPU utilization was consistent with CPU assignment. These are numbers over calico.

CPU cores	File Size	Duration	Concurrency	Threads	Average Thread Latency (ms)	Stdev Thread Latency	Max Thread Latency	Average Thread RPS	Requests per Second	Transfer per Second (MB)	Total #requests	W1	W2	W3	W4	W5	W6	W7	W8	W9	W10	W11	W12	W13	W14	W15	W16	W17	W18	W19	W20
1	1KB	180s	1000	44	104.29	141.34	5780	229	9983.92	12.04	1798097	50.4	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
2	1KB	180s	1000	44	54.19	38.97	1100	434	18882.99	22.76	3400819	50.2	46.5	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
4	1KB	180s	1000	44	26.52	34.71	1090	900	39162.55	47.21	7052924	50.5	47.8	48.4	51.1	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
8	1KB	180s	1000	44	66.38	331.14	3350	1790	75870.94	91.46	13664321	50.6	47.9	48.6	51.2	47.3	50.6	51.2	51.2	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
16	1KB	180s	1000	44	73.52	390.31	5080	2080	88849.59	107.1	16001658	28.6	29.8	29.7	27.7	27.7	29	27.2	30.6	31.1	31	15.8	44.4	31.6	41.5	39.3	27.2	N/A	N/A	N/A	N/A
20	1KB	180s	1000	44	97.78	444.23	4920	2070	86301.41	104.03	15542852	22.3	24.8	23.7	23.4	23.4	24.7	18.9	17.9	16.3	25.8	24.8	22.3	37.2	43.6	26.7	50.6	38.6	60.4	45.9	25.9
	--timeout 10s

We also switched out calico to use flannel with host-gw. The performance remained similar. For 1 core on nginx ingress controller. We would like your feedback on what else would be missing before we repeat the tests for more CPU cores.

./wrk -t 44 -c 1000 -d 180s -H "Host: host.example.com" http://$IC_IP:$IC_HTTP_PORT/1K Running 3m test @ http://143.182.136.163:31744/1K 44 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 98.64ms 30.60ms 1.99s 95.42% Req/Sec 222.89 49.29 4.15k 95.64% 1736476 requests in 3.00m, 2.04GB read Socket errors: connect 0, read 0, write 0, timeout 518 Requests/sec: 9641.72 Transfer/sec: 11.62MB

Also, if you are interested in numbers from NGINX pod that returns 1kb binary as used in the reference blog. ./wrk -t 44 -c 1000 -d 180s -H "Host: host.example.com" http://$IC_IP:$IC_HTTP_PORT/1kb.bin Running 3m test @ http://143.182.136.163:31744/1kb.bin 44 threads and 1000 connections

Thread Stats Avg Stdev Max +/- Stdev Latency 105.54ms 67.20ms 1.50s 96.55% Req/Sec 221.86 30.25 1.82k 96.98% 1734931 requests in 3.00m, 2.00GB read Socket errors: connect 0, read 0, write 0, timeout 158 Requests/sec: 9633.10 Transfer/sec: 11.39MB

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 3 years ago

This issue was closed because it has been stalled for 7 days with no activity.

nginxinc / kubernetes-ingress

Benchmarking NGINX Ingress Controller using ab #1275