Closed tgwizard closed 8 years ago
Hello!
Thanks for the nice bug report! It always makes me happy when people read the CONTRIBUTING.md file. :-)
Vegeta doesn't do any DNS caching, so my first thought would be that your configured DNS server isn't able to handle the load of queries very well (100 req/s). Can you try to debug the latency of DNS alone, with something like dig
and then dnsperf
?
Hi @tsenart. Thanks for the response!
I'll look into using dnsperf
and get back to you here about the results.
Shouldn't vegeta or the Go HTTP client cache the DNS responses, or use the OS's DNS cache? There TTL of these DNS records is higher than the 5s duration I run the tests. Perhaps I'm misunderstanding how DNS caching works in Go / Mac OS X?
For cross compatibility across different OSes, Vegeta uses the native Go DNS resolver which doesn't currently cache anything. On Thu, 10 Dec 2015 at 21:00, Adam Renberg notifications@github.com wrote:
Hi @tsenart https://github.com/tsenart. Thanks for the response!
I'll look into using dnsperf and get back to you here about the results.
Shouldn't vegeta or the Go HTTP client cache the DNS responses, or use the OS's DNS cache? There TTL of these DNS records is higher than the 5s duration I run the tests. Perhaps I'm misunderstanding how DNS caching works in Go / Mac OS X?
— Reply to this email directly or view it on GitHub https://github.com/tsenart/vegeta/issues/162#issuecomment-163734143.
It definitely seems to be the DNS lookup that is the limiting factor. I'm using the Google public DNS server 8.8.8.8 and can't do more than around 20 DNS queries per second:
↳ $ echo "tgwizard.github.io A" > cmd.txt
↳ $ dnsperf -s 8.8.8.8 -Q 20 -l 5 -d cmd.txt
DNS Performance Testing Tool
Nominum Version 2.0.0.0
[Status] Command line: dnsperf -s 8.8.8.8 -Q 20 -l 5 -d cmd.txt
[Status] Sending queries (to 8.8.8.8)
[Status] Started at: Mon Dec 14 10:34:57 2015
[Status] Stopping after 5.000000 seconds
[Status] Testing complete (time limit)
Statistics:
Queries sent: 100
Queries completed: 100 (100.00%)
Queries lost: 0 (0.00%)
Response codes: NOERROR 100 (100.00%)
Average packet size: request 36, response 87
Run time (s): 5.002568
Queries per second: 19.989733
Average Latency (s): 0.003104 (min 0.002073, max 0.046026)
Latency StdDev (s): 0.004363
↳ $ dnsperf -s 8.8.8.8 -Q 25 -l 5 -d cmd.txt
DNS Performance Testing Tool
Nominum Version 2.0.0.0
[Status] Command line: dnsperf -s 8.8.8.8 -Q 25 -l 5 -d cmd.txt
[Status] Sending queries (to 8.8.8.8)
[Status] Started at: Mon Dec 14 10:35:47 2015
[Status] Stopping after 5.000000 seconds
[Timeout] Query timed out: msg id 102
[Timeout] Query timed out: msg id 104
[Timeout] Query timed out: msg id 106
[Timeout] Query timed out: msg id 107
[Timeout] Query timed out: msg id 109
[Timeout] Query timed out: msg id 111
[Timeout] Query timed out: msg id 112
[Timeout] Query timed out: msg id 114
[Timeout] Query timed out: msg id 116
[Timeout] Query timed out: msg id 117
[Timeout] Query timed out: msg id 119
[Timeout] Query timed out: msg id 120
[Timeout] Query timed out: msg id 122
[Timeout] Query timed out: msg id 124
[Status] Testing complete (time limit)
Statistics:
Queries sent: 125
Queries completed: 111 (88.80%)
Queries lost: 14 (11.20%)
Response codes: NOERROR 111 (100.00%)
Average packet size: request 36, response 87
Run time (s): 5.002367
Queries per second: 22.189495
Average Latency (s): 0.003329 (min 0.002194, max 0.047742)
Latency StdDev (s): 0.004466
If I add tgwizard.github.io
to /etc/hosts
file, the latency of my vegeta test is lower. The 99/max is still not as low as gong directly to the IP though:
With modified /etc/hosts/
:
Latencies [mean, 50, 95, 99, max] 32.842463ms, 30.539834ms, 40.788234ms, 110.223818ms, 158.425805ms
Directly to IP:
Latencies [mean, 50, 95, 99, max] 34.719109ms, 33.253801ms, 43.778394ms, 61.65587ms, 72.716396ms
With non-etch-hosts DNS lookups:
Latencies [mean, 50, 95, 99, max] 38.539058ms, 33.294409ms, 42.907494ms, 193.650392ms, 240.162245ms
As you mention @tsenart, the Go DNS resolver doesn't use any caching. I found the comment in https://golang.org/src/net/dnsclient_unix.go saying "Could have a small cache".
@tsenart would you recommend that benchmarks / tests using vegeta specify bare IPs to get a more representative result, to not reflect DNS lookups or syscalls to read /etc/hosts
? Or would a local caching DNS resolver, or adding the name to /etc/hosts
, be the recommended way? "Representative" is perhaps the wrong word, but I'd assume that at least some clients cache the DNS results for a short while. Perhaps that assumption is wrong.
Oh, to clarify what I mean by "I can't do more than around 20 DNS queries per second" is that the timeout (here 5s) is reached. I guess clients, including the Go client, will retry the DNS query if that happens.
I'd recommend that if you don't want to include DNS resolution as part of your test, then you should use IPs instead. This isn't usually a problem when you're testing internal endpoints since an internal DNS server is used. It's not recommendable to load test Google's public DNS server as a side effect :-) On Mon, 14 Dec 2015 at 10:58, Adam Renberg notifications@github.com wrote:
Oh, to clarify what I mean by "I can't do more than around 20 DNS queries per second" is that the timeout (here 5s) is reached. I guess clients, including the Go client, will retry the DNS query if that happens.
— Reply to this email directly or view it on GitHub https://github.com/tsenart/vegeta/issues/162#issuecomment-164395051.
Haha, indeed. Thanks for the response @tsenart!
@tsenart is it possible to implement the DNS cache at vegetal side? some of our services use virtual host, so it's better to use domain name rather than ip address in the targets. thanks
@missedone: Isn't virtual hosting based on the Host
header value in HTTP requests? You can still set this I believe. Otherwise you could also add these entries to your /etc/hosts file.
@tsenart We were caught by surprise that performing an HTTP load test also stressed our DNS servers. Any chance you'd reconsider? Or perhaps require an IP address rather than a hostname?
For example, wrk
doesn't do it automatically but has an example on how to deal with a hostname resolving to multiple IPs.
https://github.com/wg/wrk/blob/next/scripts/addr.lua
Thanks
What version of the project are you using?
What operating system and processor architecture are you using?
What did you do?
A test using the hostname as the target of the requests:
Same requests but directly to an applicable IP address (
dig tgwizard.github.io
), specifying the hostname in theHost
header.What did you expect to see? What did you see instead?
I expected them to show about the same latencies. The first test above has a much higher mean, 95, 99 and max latency percentiles. Why? Are DNS lookups made multiple times in the first case? Am I missing something?
I can replicate this issue for multiple hostnames, on different services, so it is not specific to Fastly.