kata-containers / runtime

Kata Containers version 1.x runtime (for version 2.x see https://github.com/kata-containers/kata-containers).
https://katacontainers.io/
Apache License 2.0
2.1k stars 375 forks source link

default configuration results in poor nginx performance #551

Closed egernst closed 3 years ago

egernst commented 6 years ago

Description of problem

When running nginx server, testing with tools like ab and hey show inconsistent and poor performance with respect to request handling rate (req/sec)

Expected result

A request is 612 Bytes. I expect that the TCP throughput of Kata Containers should be sufficient enough to handle a higher rate of requests. For example, iperf results on same machine measured: 512B transmission: 8.82 Gbps bw 1024B transmission: 18.1 Gbps bw

Actual result

There are many retries some errors observed when running. Resulting req/sec rate is too low when running with kata-runtime. running "vmstat 1" in the container shows that the actual cpu is pretty idle, and we are not memory bound.

To run:

start nginx server container:

Note, you should also constrain the number of vcpus/memory below:

docker run --runtime=kata-runtime -itd --rm -p 8080:80 nginx

exercise nginx:

I would run either ab or hey for a period of time (ie, 60 seconds), each time with a different level of concurrency (ie, 100, 200, 500, 1000).

hey -c 100 -z 1m http://10.7.200.165

analysis / observations

I started off testing with just ab, but found that hey was a bit nicer to look at, and provided a better summary of errors observed. Results seem unreliable, though, if there are too many errors (resolved after adjusting ulimit). See https://github.com/rakyll/hey

ulimit settings on host / guest:

Saw errors as a result of settings on host and guest: [17142301] Get http://10.7.200.165:8080/: dial tcp 10.7.200.165:8080: socket: too many open files

This would result in unreliable req/sec results. Need to update ulimit to a more sane value.

On i3.metal on AWS with xenial installed, the host originally shows:

$ ulimit -Sn
1024
$ ulimit -Hn
1048576

Updated:

$ ulimit -n 1048576
$ ulimit -Sn
1048576

number of queues for macvtap

Today this is hardcoded to 8: https://github.com/kata-containers/runtime/blob/16600efc1da0dc893c1a12424902553cf7d1266f/virtcontainers/network.go#L97

This should be set equal to the number of vCPUs (or less) by default, and be made configurable.

Tuning nginx itself:

Adjustments to the number of worker processes and worker_connections (through /etc/nginx/nginx.conf): default: 1, 1024 adjusted: 8, 8192

With this adjustmnet

sysctl tuning within the guest:

Sync queue sizes: tcp_max_syn_backlog

details from experiments tbd

somaxconn

details from experiments tbd

Sample result:

Sample result without making adjustments (req/sec is bogus due to socket / too many open files issue).

Summary:
  Total:        60.1512 secs
  Slowest:      20.0032 secs
  Fastest:      0.0002 secs
  Average:      0.3697 secs
  Requests/sec: 285504.1402

  Total data:   18292680 bytes
  Size/request: 612 bytes

Response time histogram:
  0.000 [1]     |
  2.001 [28636] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  4.001 [0]     |
  6.001 [0]     |
  8.001 [886]   |■
  10.002 [0]    |
  12.002 [331]  |
  14.002 [0]    |
  16.003 [0]    |
  18.003 [0]    |
  20.003 [36]   |

Latency distribution:
  10% in 0.0065 secs
  25% in 0.0115 secs
  50% in 0.0202 secs
  75% in 0.0369 secs
  90% in 0.0807 secs
  95% in 0.1732 secs
  99% in 10.0439 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0413 secs, 0.0002 secs, 20.0032 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0000 secs
  req write:    0.0001 secs, 0.0000 secs, 0.0559 secs
  resp wait:    0.0235 secs, 0.0001 secs, 0.9614 secs
  resp read:    0.0028 secs, 0.0000 secs, 0.0391 secs

Status code distribution:
  [200] 29890 responses

Error distribution:
  [17142301]    Get http://10.7.200.165:8080/: dial tcp 10.7.200.165:8080: socket: too many open files
  [1217]        Get http://10.7.200.165:8080/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  [11]  Get http://10.7.200.165:8080/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
grahamwhaley commented 6 years ago

What is the best way you've found to monitor those packet retry/losses so far @egernst ? Maybe we can enhance/modify our existing CC ab/nginx test and port it over to Kata as a start, aiming to add something to the metrics CI maybe.

egernst commented 6 years ago

cc/ @jon @amshinde

egernst commented 6 years ago

Well, turned out tuning TCP isn't what's really going to change the performance here (though still a good idea). The primary issue for poor performance is that when using overlay, index.html is stored a 9pvolume, and thus access is slow for each established session.

When testing with devicemapper instead of overlay, performance becomes very reasonable. Some data measured in an AWS i3.metal xenial machine:

req/sec
--
concurrent requests | kata  
100 | 20,544.10  
200 | 18,862.31 
500 | 18,475.07  
1000 | 19,218.69

This matches what I would normally expect for nginx on kata (and performs well compared to runc). These numbers are grabbed using ab -n 10000 -c <concurrent value> http://<myserver>:8080 and the nginx server is started with docker run --runtime=kata-runtime -itd --rm --cpus=8 --memory=16G -p 8080:80 nginx

The memory is pretty arbitrary since we aren't memory bound, and CPUs was just used to match the default number of queues used today in Kata (again, this will be made configurable and default will match number of vCPUs). Nginx is configured to use 8 workers and support 8192 worker-connections.

egernst commented 6 years ago

We can close this issue after we have documentation / collateral in place to describe a couple of ways to work around this issue (ie, use devicemapper or avoiding 9p based volumes by using ramfs or block backed volumes).

grahamwhaley commented 6 years ago

One of my suspicions around the 9p hit here is that, afaik, we run 9p in 'default cache mode', which I believe is 'cache=none'. 9p and caching is a bit of an area of tradeoffs aiui. @bergwolf @gnawux @WeiZhang555 - do you guys have any input and wisdom from your previous experiences of 9p optimisations? I suspect if the index file was cached then the performance would go up with 9p as well. But, we'd want to be sure what the side effects of enabling the cache are etc. The theory should be pretty easy to test out itself though.

WeiZhang555 commented 6 years ago

@grahamwhaley A late reply :stuck_out_tongue:

9pfs with cache has some quite anoying problems of synchronizing data between host and guest, though it has better performance, I think it's better to keep cache mode closed. The syncing problem will influence some use cases(such as logs/configMap via 9pfs), and I think the side effect is larger than benefit. It's only my 2 cents