Vegeta much slower when using DNS instead of IP in the targets

tgwizard commented 8 years ago

What version of the project are you using?

↳ $ vegeta -version
6.0.0
↳ $ brew info vegeta
vegeta: stable 6.0.0 (bottled)
HTTP load testing tool and library
https://github.com/tsenart/vegeta
/usr/local/Cellar/vegeta/6.0.0 (4 files, 8.1M) *
  Poured from bottle
From: https://github.com/Homebrew/homebrew/blob/master/Library/Formula/vegeta.rb
==> Dependencies
Build: go ✔

What operating system and processor architecture are you using?

Mac OS X 10.11.1
Darwin XYZ 15.0.0 Darwin Kernel Version 15.0.0: Sat Sep 19 15:53:46 PDT 2015; root:xnu-3247.10.11~1/RELEASE_X86_64 x86_64

What did you do?

A test using the hostname as the target of the requests:

↳ $ echo -e "GET http://tgwizard.github.io/thisdoesnotexist" | vegeta attack -redirects=0 -duration=5s -rate=100 | tee results.bin | vegeta report
Requests      [total, rate]            500, 100.20
Duration      [total, attack, wait]    5.02127518s, 4.989999839s, 31.275341ms
Latencies     [mean, 50, 95, 99, max]  156.20215ms, 32.108229ms, 891.014049ms, 1.081619322s, 1.137803125s
Bytes In      [total, mean]            4558000, 9116.00
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  0.00%
Status Codes  [code:count]             404:500
Error Set:
404 Not Found

Same requests but directly to an applicable IP address (dig tgwizard.github.io), specifying the hostname in the Host header.

↳ $ echo -e "GET http://185.31.17.133/thisdoesnotexist\nHost: tgwizard.github.io" | vegeta attack -redirects=0 -duration=5s -rate=100 | tee results.bin | vegeta report
Requests      [total, rate]            500, 100.20
Duration      [total, attack, wait]    5.019171174s, 4.989999954s, 29.17122ms
Latencies     [mean, 50, 95, 99, max]  30.963641ms, 29.112936ms, 31.798525ms, 113.594044ms, 163.813946ms
Bytes In      [total, mean]            4558000, 9116.00
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  0.00%
Status Codes  [code:count]             404:500
Error Set:
404 Not Found

What did you expect to see? What did you see instead?

I expected them to show about the same latencies. The first test above has a much higher mean, 95, 99 and max latency percentiles. Why? Are DNS lookups made multiple times in the first case? Am I missing something?

I can replicate this issue for multiple hostnames, on different services, so it is not specific to Fastly.

tsenart commented 8 years ago

Hello!

Thanks for the nice bug report! It always makes me happy when people read the CONTRIBUTING.md file. :-)

Vegeta doesn't do any DNS caching, so my first thought would be that your configured DNS server isn't able to handle the load of queries very well (100 req/s). Can you try to debug the latency of DNS alone, with something like dig and then dnsperf?

tgwizard commented 8 years ago

Hi @tsenart. Thanks for the response!

I'll look into using dnsperf and get back to you here about the results.

Shouldn't vegeta or the Go HTTP client cache the DNS responses, or use the OS's DNS cache? There TTL of these DNS records is higher than the 5s duration I run the tests. Perhaps I'm misunderstanding how DNS caching works in Go / Mac OS X?

tsenart commented 8 years ago

For cross compatibility across different OSes, Vegeta uses the native Go DNS resolver which doesn't currently cache anything. On Thu, 10 Dec 2015 at 21:00, Adam Renberg notifications@github.com wrote:

Hi @tsenart https://github.com/tsenart. Thanks for the response!

I'll look into using dnsperf and get back to you here about the results.

Shouldn't vegeta or the Go HTTP client cache the DNS responses, or use the OS's DNS cache? There TTL of these DNS records is higher than the 5s duration I run the tests. Perhaps I'm misunderstanding how DNS caching works in Go / Mac OS X?

— Reply to this email directly or view it on GitHub https://github.com/tsenart/vegeta/issues/162#issuecomment-163734143.

tgwizard commented 8 years ago

It definitely seems to be the DNS lookup that is the limiting factor. I'm using the Google public DNS server 8.8.8.8 and can't do more than around 20 DNS queries per second:

↳ $ echo "tgwizard.github.io A" > cmd.txt

↳ $ dnsperf -s 8.8.8.8 -Q 20 -l 5 -d cmd.txt
DNS Performance Testing Tool
Nominum Version 2.0.0.0

[Status] Command line: dnsperf -s 8.8.8.8 -Q 20 -l 5 -d cmd.txt
[Status] Sending queries (to 8.8.8.8)
[Status] Started at: Mon Dec 14 10:34:57 2015
[Status] Stopping after 5.000000 seconds
[Status] Testing complete (time limit)

Statistics:

  Queries sent:         100
  Queries completed:    100 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 100 (100.00%)
  Average packet size:  request 36, response 87
  Run time (s):         5.002568
  Queries per second:   19.989733

  Average Latency (s):  0.003104 (min 0.002073, max 0.046026)
  Latency StdDev (s):   0.004363

↳ $ dnsperf -s 8.8.8.8 -Q 25 -l 5 -d cmd.txt
DNS Performance Testing Tool
Nominum Version 2.0.0.0

[Status] Command line: dnsperf -s 8.8.8.8 -Q 25 -l 5 -d cmd.txt
[Status] Sending queries (to 8.8.8.8)
[Status] Started at: Mon Dec 14 10:35:47 2015
[Status] Stopping after 5.000000 seconds
[Timeout] Query timed out: msg id 102
[Timeout] Query timed out: msg id 104
[Timeout] Query timed out: msg id 106
[Timeout] Query timed out: msg id 107
[Timeout] Query timed out: msg id 109
[Timeout] Query timed out: msg id 111
[Timeout] Query timed out: msg id 112
[Timeout] Query timed out: msg id 114
[Timeout] Query timed out: msg id 116
[Timeout] Query timed out: msg id 117
[Timeout] Query timed out: msg id 119
[Timeout] Query timed out: msg id 120
[Timeout] Query timed out: msg id 122
[Timeout] Query timed out: msg id 124
[Status] Testing complete (time limit)

Statistics:

  Queries sent:         125
  Queries completed:    111 (88.80%)
  Queries lost:         14 (11.20%)

  Response codes:       NOERROR 111 (100.00%)
  Average packet size:  request 36, response 87
  Run time (s):         5.002367
  Queries per second:   22.189495

  Average Latency (s):  0.003329 (min 0.002194, max 0.047742)
  Latency StdDev (s):   0.004466

If I add tgwizard.github.io to /etc/hosts file, the latency of my vegeta test is lower. The 99/max is still not as low as gong directly to the IP though:

With modified /etc/hosts/:

Latencies     [mean, 50, 95, 99, max]  32.842463ms, 30.539834ms, 40.788234ms, 110.223818ms, 158.425805ms

Directly to IP:

Latencies     [mean, 50, 95, 99, max]  34.719109ms, 33.253801ms, 43.778394ms, 61.65587ms, 72.716396ms

With non-etch-hosts DNS lookups:

Latencies     [mean, 50, 95, 99, max]  38.539058ms, 33.294409ms, 42.907494ms, 193.650392ms, 240.162245ms

As you mention @tsenart, the Go DNS resolver doesn't use any caching. I found the comment in https://golang.org/src/net/dnsclient_unix.go saying "Could have a small cache".

@tsenart would you recommend that benchmarks / tests using vegeta specify bare IPs to get a more representative result, to not reflect DNS lookups or syscalls to read /etc/hosts? Or would a local caching DNS resolver, or adding the name to /etc/hosts, be the recommended way? "Representative" is perhaps the wrong word, but I'd assume that at least some clients cache the DNS results for a short while. Perhaps that assumption is wrong.

tgwizard commented 8 years ago

Oh, to clarify what I mean by "I can't do more than around 20 DNS queries per second" is that the timeout (here 5s) is reached. I guess clients, including the Go client, will retry the DNS query if that happens.

tsenart commented 8 years ago

I'd recommend that if you don't want to include DNS resolution as part of your test, then you should use IPs instead. This isn't usually a problem when you're testing internal endpoints since an internal DNS server is used. It's not recommendable to load test Google's public DNS server as a side effect :-) On Mon, 14 Dec 2015 at 10:58, Adam Renberg notifications@github.com wrote:

Oh, to clarify what I mean by "I can't do more than around 20 DNS queries per second" is that the timeout (here 5s) is reached. I guess clients, including the Go client, will retry the DNS query if that happens.

— Reply to this email directly or view it on GitHub https://github.com/tsenart/vegeta/issues/162#issuecomment-164395051.

tgwizard commented 8 years ago

Haha, indeed. Thanks for the response @tsenart!

missedone commented 7 years ago

@tsenart is it possible to implement the DNS cache at vegetal side? some of our services use virtual host, so it's better to use domain name rather than ip address in the targets. thanks

tsenart commented 7 years ago

@missedone: Isn't virtual hosting based on the Host header value in HTTP requests? You can still set this I believe. Otherwise you could also add these entries to your /etc/hosts file.

pmahoney-raise commented 6 years ago

@tsenart We were caught by surprise that performing an HTTP load test also stressed our DNS servers. Any chance you'd reconsider? Or perhaps require an IP address rather than a hostname?

For example, wrk doesn't do it automatically but has an example on how to deal with a hostname resolving to multiple IPs. https://github.com/wg/wrk/blob/next/scripts/addr.lua

Thanks

tsenart / vegeta

Vegeta much slower when using DNS instead of IP in the targets #162