Closed dakshinai closed 3 years ago
Hi @dakshinai
I suggest trying to apply the following performance related optimizations in the ConfigMap:
kind: ConfigMap
apiVersion: v1
metadata:
name: nginx-config
namespace: nginx-ingress
data:
worker-connections: "10000"
worker-rlimit-nofile: "10240"
keepalive: "100"
keepalive-requests: "100000000"
Those are from this blog post https://www.nginx.com/blog/performance-testing-nginx-ingress-controllers-dynamic-kubernetes-cloud-environment/ . Those optimizations are also described here -- https://www.nginx.com/blog/tuning-nginx/
additionally, it makes sense to add worker-cpu-affinity: "auto"
, so that worker processes are pinned to the core. This is described in the blog post you referenced here https://github.com/nginxinc/kubernetes-ingress/issues/1276
This does not affect the numbers yet. The comparison inline.
ab -n 10000 -c 100 -H "Host: cafe.example.com" https://$IC_IP:$IC_HTTPS_PORT/tea This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 143.182.136.163 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests
Server Software: nginx/1.19.3 Server Hostname: 143.182.136.163 Server Port: 31946 SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256 Server Temp Key: X25519 253 bits TLS Server Name: cafe.example.com
Document Path: /tea Document Length: 155 bytes
Concurrency Level: 100 Time taken for tests: 4.403 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 3640000 bytes HTML transferred: 1550000 bytes Requests per second: 2271.41 [#/sec] (mean) Time per request: 44.025 [ms] (mean) Time per request: 0.440 [ms] (mean, across all concurrent requests) Transfer rate: 807.42 [Kbytes/sec] received
Connection Times (ms) min mean[+/-sd] median max Connect: 2 37 21.7 33 110 Processing: 0 7 3.3 6 47 Waiting: 0 4 3.2 3 47 Total: 2 43 24.0 39 130
Percentage of the requests served within a certain time (ms) 50% 39 66% 50 75% 59 80% 65 90% 80 95% 88 98% 92 99% 121 100% 130 (longest request)
ab -n 10000 -c 100 -k http://10.105.165.64/tea This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 10.105.165.64 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests
Server Software: nginx/1.16.1 Server Hostname: 10.105.165.64 Server Port: 80
Document Path: /tea Document Length: 155 bytes
Concurrency Level: 100 Time taken for tests: 0.106 seconds Complete requests: 10000 Failed requests: 0 Keep-Alive requests: 9917 Total transferred: 3689585 bytes HTML transferred: 1550000 bytes Requests per second: 94520.64 [#/sec] (mean) Time per request: 1.058 [ms] (mean) Time per request: 0.011 [ms] (mean, across all concurrent requests) Transfer rate: 34056.83 [Kbytes/sec] received
Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.3 0 4 Processing: 0 1 0.3 1 4 Waiting: 0 1 0.3 1 4 Total: 0 1 0.5 1 5
Percentage of the requests served within a certain time (ms) 50% 1 66% 1 75% 1 80% 1 90% 2 95% 2 98% 3 99% 4 100% 5 (longest request)
Hi @dakshinai
Because the second test is done without TLS termination, I suggest removing TLS termination from Ingress resource as it affects RPS.
ab -n 10000 -c 100 -k -H "Host: cafe.example.com" http://$IC_IP:$IC_HTTP_PORT/tea This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 143.182.136.163 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests
Server Software: nginx/1.19.3 Server Hostname: 143.182.136.163 Server Port: 30804
Document Path: /tea Document Length: 155 bytes
Concurrency Level: 100 Time taken for tests: 0.363 seconds Complete requests: 10000 Failed requests: 0 Keep-Alive requests: 9928 Total transferred: 3689640 bytes HTML transferred: 1550000 bytes Requests per second: 27564.92 [#/sec] (mean) Time per request: 3.628 [ms] (mean) Time per request: 0.036 [ms] (mean, across all concurrent requests) Transfer rate: 9932.09 [Kbytes/sec] received
Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.3 0 4 Processing: 0 4 0.3 4 6 Waiting: 0 4 0.3 4 6 Total: 0 4 0.3 4 6
Percentage of the requests served within a certain time (ms) 50% 4 66% 4 75% 4 80% 4 90% 4 95% 4 98% 4 99% 5 100% 6 (longest request)
ab -n 10000 -c 100 -k http://10.105.165.64/tea This is ApacheBench, Version 2.3 <$Revision: 1843412 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 10.105.165.64 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests
Server Software: nginx/1.16.1 Server Hostname: 10.105.165.64 Server Port: 80
Document Path: /tea Document Length: 155 bytes
Concurrency Level: 100 Time taken for tests: 0.137 seconds Complete requests: 10000 Failed requests: 0 Keep-Alive requests: 9901 Total transferred: 3689505 bytes HTML transferred: 1550000 bytes Requests per second: 72878.33 [#/sec] (mean) Time per request: 1.372 [ms] (mean) Time per request: 0.014 [ms] (mean, across all concurrent requests) Transfer rate: 26258.30 [Kbytes/sec] received
Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.3 0 5 Processing: 0 1 0.2 1 5 Waiting: 0 1 0.1 1 5 Total: 0 1 0.4 1 5
Percentage of the requests served within a certain time (ms) 50% 1 66% 1 75% 1 80% 1 90% 1 95% 1 98% 2 99% 4 100% 5 (longest request)
cat nginx-config.yaml kind: ConfigMap apiVersion: v1 metadata: name: nginx-config namespace: nginx-ingress data: worker-connections: "10000" worker-rlimit-nofile: "102400" keepalive: "100" keepalive_requests: "100000000" worker-cpu-affinity: "auto"
Hi @pleshakov, Removing TLS termination does not help either though without keep alive the numbers are similar. Can you help clarify how keep alive effects here?
Hi @dakshinai
Looking at your latest results, looks like the difference has improved, no?
Removing TLS termination does not help either though without keep alive the numbers are similar. Can you help clarify how keep alive effects here?
at higher number of available cores, it is consistent with our performance testing results for NGINX -- https://www.nginx.com/blog/testing-the-performance-of-nginx-and-nginx-plus-web-servers/ (section RPS for HTTPS Requests )
Can you help clarify how keep alive effects here?
Keepalives allow NGINX to reuse connections to backends for subsequent requests. Without keepalives, NGINX will try to establish a new connection for every request.
Hi @pleshakov,
The results improved and matched up without client keepalives, but my question was that keepalives drastically improved standalone results, so ingress results do not match up to standalone with keepalives turned on.
We tried another wrk test with the modified config below, if we were to just focus on getting nginx numbers similar to blogs below https://www.nginx.com/blog/testing-the-performance-of-nginx-and-nginx-plus-web-servers/ https://www.nginx.com/blog/testing-performance-nginx-ingress-controller-kubernetes/
kind: ConfigMap apiVersion: v1 metadata: name: nginx-config namespace: nginx-ingress data: worker-processes: "auto" worker-connections: "10000" worker-rlimit-nofile: "102400" worker-cpu-affinity: "auto" keepalive-timeout: "120" keepalive-requests: "10000"
wrk -t 44 -c 1000 -d 180s -H "Host: cafe.example.com" http://$IC_IP:$IC_HTTP_PORT/tea Running 3m test @ http://143.182.136.163:31432/tea 44 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 48.26ms 116.53ms 1.47s 96.73% Req/Sec 0.89k 1.22k 9.72k 89.35% 6978810 requests in 3.00m, 2.40GB read Socket errors: connect 0, read 0, write 0, timeout 44 Requests/sec: 38749.87 Transfer/sec: 13.64MB
This is over a 40GbE line. The blog ingress controller results were 36,647 for 1 CPU and 342,785 for 24 CPU. We have 40 cores on the system running the ingress controller and app but our results only reached 38749.87
Hi @dakshinai
Networking stack could be a bottleneck. I suggest investigating that. You can also tweak the number of worker process in NGINX (ex: worker-processes: "4"), to see when it no longer makes sense to increase the number of processes.
Is there a way to check the number of CPU cores used by ingress controller? In our case 1 to 40?
Perhaps a top
command on the node during the test and see which NGINX worker processes are utilized and which ones are not utilized?
Is there a place where I can download a container image for the NGINX web server used in the test blogs? Current tests work with response sizes ~150bytes but we would like to test for multiple file size responses.
The image is just NGINX configured with ConfigMap. Please see https://www.nginx.com/blog/testing-performance-nginx-ingress-controller-kubernetes/ the section Backend DaemonSet Deployment
Hi @dakshinai,
I would suggest starting with 1 CPU or worker process, and testing the performance. Use worker-processes: “1” key in ConfigMap, and test to see if you can get roughly 36 K RPS. The performance should double as you double the cores but only if the CPUs are in the same NUMA node. You will not get linear increase in performance if you just use cores from different NUMA nodes. Additionally there is a container networking bottleneck so the performance may flatten at after 16 cores (in the case of our testing).
I use htop to list the CPU utilization of the system. Also note that we used Flannel as the networking stack in Kubernetes, with the Host-GW enabled. That will increase performance but the nodes in the cluster need to be in the same LAN. More details about this can be found in the blog.
And finally you can get access to the nginx web server image from here: https://hub.docker.com/r/rawdata1234/nginx-webserver-payloads
Thanks @pleshakov and @rawdata123
So we did experiment with moving #cores to nginx controller linearly. While the progress was linear the numbers do not match up. The setup is similar to https://www.nginx.com/blog/testing-performance-nginx-ingress-controller-kubernetes/ except for us hosting the ingress controller and nginx pod services on the same node. Also the NGINX pod fetches files from a host volume mounted into NGINX server container. Its a 2 NUMA node system with 20 cores each. So nginx ingress controller could use 20 cores in first NUMA and nginx pod services on second. The numbers below were captured with 20 cores assigned to NGINX pod services and by varying #cores to ingress controller. The only bottleneck we suspect is cross NUMA communication between ingress controller and nginx pod service. htop CPU utilization was consistent with CPU assignment. These are numbers over calico.
CPU cores | File Size | Duration | Concurrency | Threads | Average Thread Latency (ms) | Stdev Thread Latency | Max Thread Latency | Average Thread RPS | Requests per Second | Transfer per Second (MB) | Total #requests | W1 | W2 | W3 | W4 | W5 | W6 | W7 | W8 | W9 | W10 | W11 | W12 | W13 | W14 | W15 | W16 | W17 | W18 | W19 | W20 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1KB | 180s | 1000 | 44 | 104.29 | 141.34 | 5780 | 229 | 9983.92 | 12.04 | 1798097 | 50.4 | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
2 | 1KB | 180s | 1000 | 44 | 54.19 | 38.97 | 1100 | 434 | 18882.99 | 22.76 | 3400819 | 50.2 | 46.5 | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
4 | 1KB | 180s | 1000 | 44 | 26.52 | 34.71 | 1090 | 900 | 39162.55 | 47.21 | 7052924 | 50.5 | 47.8 | 48.4 | 51.1 | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
8 | 1KB | 180s | 1000 | 44 | 66.38 | 331.14 | 3350 | 1790 | 75870.94 | 91.46 | 13664321 | 50.6 | 47.9 | 48.6 | 51.2 | 47.3 | 50.6 | 51.2 | 51.2 | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
16 | 1KB | 180s | 1000 | 44 | 73.52 | 390.31 | 5080 | 2080 | 88849.59 | 107.1 | 16001658 | 28.6 | 29.8 | 29.7 | 27.7 | 27.7 | 29 | 27.2 | 30.6 | 31.1 | 31 | 15.8 | 44.4 | 31.6 | 41.5 | 39.3 | 27.2 | N/A | N/A | N/A | N/A |
20 | 1KB | 180s | 1000 | 44 | 97.78 | 444.23 | 4920 | 2070 | 86301.41 | 104.03 | 15542852 | 22.3 | 24.8 | 23.7 | 23.4 | 23.4 | 24.7 | 18.9 | 17.9 | 16.3 | 25.8 | 24.8 | 22.3 | 37.2 | 43.6 | 26.7 | 50.6 | 38.6 | 60.4 | 45.9 | 25.9 |
--timeout 10s |
We also switched out calico to use flannel with host-gw. The performance remained similar. For 1 core on nginx ingress controller. We would like your feedback on what else would be missing before we repeat the tests for more CPU cores.
./wrk -t 44 -c 1000 -d 180s -H "Host: host.example.com" http://$IC_IP:$IC_HTTP_PORT/1K Running 3m test @ http://143.182.136.163:31744/1K 44 threads and 1000 connections Thread Stats Avg Stdev Max +/- Stdev Latency 98.64ms 30.60ms 1.99s 95.42% Req/Sec 222.89 49.29 4.15k 95.64% 1736476 requests in 3.00m, 2.04GB read Socket errors: connect 0, read 0, write 0, timeout 518 Requests/sec: 9641.72 Transfer/sec: 11.62MB
Also, if you are interested in numbers from NGINX pod that returns 1kb binary as used in the reference blog. ./wrk -t 44 -c 1000 -d 180s -H "Host: host.example.com" http://$IC_IP:$IC_HTTP_PORT/1kb.bin Running 3m test @ http://143.182.136.163:31744/1kb.bin 44 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev Latency 105.54ms 67.20ms 1.50s 96.55% Req/Sec 221.86 30.25 1.82k 96.98% 1734931 requests in 3.00m, 2.00GB read Socket errors: connect 0, read 0, write 0, timeout 158 Requests/sec: 9633.10 Transfer/sec: 11.39MB
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.
Describe the bug We are trying to benchmark NGINX Ingress Controller setup exposed via Node Port on a single controller/worker setup running the "complete-example".
The ab numbers for individual coffee/tea services exposed via ClusterIP are far higher compared to access to them via NGINX Ingress Node Port.
Does ingress controller config require any optimization to get better results?
To Reproduce kubectl get svc --all-namespaces -o wide NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR default coffee-svc ClusterIP 10.106.39.175 80/TCP 15d app=coffee
default tea-svc ClusterIP 10.105.165.64 80/TCP 15d app=tea
nginx-ingress nginx-ingress NodePort 10.99.39.131 80:30082/TCP,443:32149/TCP 8d app=nginx-ingress
Expected behavior Current ab results from controller
Your environment