rootwyrm / dns_docker

Complete DNS suite for use in Docker
Other
2 stars 0 forks source link

dnsdist 2.5% CPU barrier #13

Closed rootwyrm closed 4 years ago

rootwyrm commented 4 years ago

Extremely perplexing; repeated performance testing on multiple platforms and architectures shows dnsdist simply blocking once it hits 2.5% CPU reported on the dashboard. (Not 25%, 2.5%.) Net result is catastrophically bad performance on all architectures relative to hardware capabilities.

aarch64 (RPi4/4GB): 31-36qps
i7-3820 @ 4.7GHz: 112-124qps

This occurs regardless of the number of threads, tcpFastOpen, or any other options when using a pure-C++ ruleset. Preliminary investigation seems to indicate a possible problem in the boost libraries Alpine provides as an APK currently. (Which would suck; boost takes many hours to build and adding it to the base would make CICD miserable.)

rootwyrm commented 4 years ago

image

i7-3820 @ 4.7GHz (VMware Workstation) with network_mode: host for all components. nsd with a 260,000 query run tested at over 70,000qps.

[Status] Command line: dnsperf -f inet -m udp -s X.X.X.X -p 10530 -T 2 -c 100 -d query-root.txt -n 10000
[Status] Sending queries (to 10.1.1.54)
[Status] Started at: Wed Jul  1 15:59:12 2020
[Status] Stopping after 10000 runs through file
[Status] Testing complete (end of file)

Statistics:

  Queries sent:         260000
  Queries completed:    260000 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 260000 (100.00%)
  Average packet size:  request 36, response 506
  Run time (s):         3.465583
  Queries per second:   75023.452043

  Average Latency (s):  0.001025 (min 0.000024, max 0.020108)
  Latency StdDev (s):   0.000895

[Status] Command line: dnsperf -f inet -m tcp -s X.X.X.X -p 10530 -T 2 -c 100 -d query-root.txt -n 10000
[Status] Sending queries (to 10.1.1.54)
[Status] Started at: Wed Jul  1 15:59:58 2020
[Status] Stopping after 10000 runs through file
[Status] Testing complete (end of file)

Statistics:

  Queries sent:         260000
  Queries completed:    260000 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 260000 (100.00%)
  Average packet size:  request 36, response 814
  Run time (s):         6.271272
  Queries per second:   41458.893826

  Average Latency (s):  0.001914 (min 0.000039, max 0.042489)
  Latency StdDev (s):   0.001443
rootwyrm commented 4 years ago

A bit of experimentation was able to increase dnsdist's CPU consumption by increasing the number of clients, but this does not really help performance; it's still <10% of what nsd can push (and dnsdist is also not caching.)

[Status] Command line: dnsperf -f inet -m tcp -s X.X.X.X -p 10053 -T 2 -c 100 -d query-root.txt -n 10000
[Status] Sending queries (to 10.1.1.54)
[Status] Started at: Wed Jul  1 16:01:21 2020
[Status] Stopping after 10000 runs through file
[Status] Testing complete (end of file)

Statistics:

  Queries sent:         260000
  Queries completed:    260000 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 260000 (100.00%)
  Average packet size:  request 36, response 814
  Run time (s):         34.585820
  Queries per second:   7517.531751

  Average Latency (s):  0.012832 (min 0.000125, max 0.407312)
  Latency StdDev (s):   0.037401

edit; running the host machine out of CPU with >50% in system at 1000 clients so can't validate scaling. There is definitely something in the kernel dragging this down HARD.

rootwyrm commented 4 years ago

Don't think it's going to get much better than this on RPi4.

Statistics:

  Queries sent:         2600000
  Queries completed:    2598925 (99.96%)
  Queries lost:         1075 (0.04%)

  Response codes:       NOERROR 2598925 (100.00%)
  Average packet size:  request 36, response 506
  Run time (s):         156.768265
  Queries per second:   16578.132060

  Average Latency (s):  0.003722 (min 0.000261, max 0.068626)
  Latency StdDev (s):   0.005671

All lost queries were cache hits (???), and it's still only 25% of nsd throughput. However, CPU utilization peaked at "273%" host level (54% reported in container.)

DNSecure dv7a0 subjected to the same test using the RPi4's nsd backend is managing maximum tuned throughput (~40k qps) and dh3p2 managed about 20k qps, so it's definitely pointing toward a stack limitation of some sort.

rootwyrm commented 4 years ago

Closing complete as there is no way to get more out of this without going down the always dangerous tuning rabbit hole, which is just going to end up in tears and issues until the cows build their own home. Without opposable thumbs.

Really, if you need more than 15k QPS serving, you can afford to spin up a 1vCPU 2GB amd64 VM somewhere in your environment. (Or just use Anycast with ECMP because then 15k QPS per $50 ain't too bad either.)