What are the issues with `ab`?

ioquatix commented 5 years ago

ApacheBench is not recommended for new uses - its reporting and accuracy are poor and there are bugs and edge-cases that make it hard to be sure you're getting what you think you are.

Citation?

noahgibbs commented 5 years ago

For poor accuracy, check its output formats - nothing below millisecond timing is reported at all. For the kind of testing RSB does, that kind of accuracy loss is completely unacceptable, and I need more specific timing than straight-up iterations/second for a lot of what I do.

For bugs, I'll start by citing Phusion Passenger's benchmarking recommendations, which says the same but is unspecific: https://www.phusionpassenger.com/library/config/nginx/optimization/#benchmarking-recommendations

I've also had trouble with its KeepAlive mode. Separately, Charles Nutter and Tom Enebo have reported (pers. comm) AB's KeepAlive is normally HTTP 1.0 only, which means it's underspecified and quirky (the HTTP 1.0 KeepAlive spec isn't good), but you also can't easily switch it to the better-specified, better-supported HTTP 1.1 KeepAlive behavior.

ioquatix commented 5 years ago

I personally use ab as well as wrk and never had any general problems with either.

I've heard that Puma has bugs handling Connection: keep-alive. But I'm not sure why, since it's a fairly straight forward header to support. e.g. https://github.com/puma/puma/issues/1565

nothing below millisecond timing is reported at all

That's actually a legitimate concern - the only one here - and I'd suggest you just keep to specifics. Because I personally find ab a really great tool for testing the trade-off between persistent and non-persistent connections. Yes, it's not perfect, but what tool is?

ioquatix commented 5 years ago

Looking at passenger documentation:

Enable HTTP keep-alive in both the server and in your benchmarking tool. Otherwise you will end up benchmarking how quickly the kernel can set up TCP connections, which is a non-trivial part of the request time.

Yes, that may be true, but you also suffer more because of the design of Passenger - using Nginx as a proxy it requires establishing multiple connections and the overhead per connection is pretty big, by the design choices they made. So logically, it makes sense that they'd recommend to avoid this kind of benchmark.

noahgibbs commented 5 years ago

I can't tell you why Puma has trouble with KeepAlive, but I've been at a large table full of Puma (and JRuby and various other) developers trying to track it down, and it seems to be a nontrivial issue. While the final fix may be something easy, finding that "something easy" does not seem to be straightforward.

I'm not suggesting that Phusion is 100% unbiased (nobody is.) But if I've found odd bugs with AB and they've found odd bugs with AB and the JRuby guys have found odd bugs with AB, maybe AB has some bugs?

I am assuming that when you say "I'd suggest you just keep to specifics" and declare the other concerns illegitimate you mean "please remove the idea that ApacheBench is buggy from your README", presumably based on, as you say, the fact that you "personally find ab a really great tool for testing the trade-off between persistent and non-persistent connections." I'm not sure how to respond to that, though leaving this bug here as documentation of somebody disagreeing with me is one possibility.

I'm also planning to add a patch for a better KeepAlive mode to wrk, in order to get a tool I find really great for testing that tradeoff. I'm not wild about how either wrk or AB does it currently, and the wrk code for that looks easy to patch. But that's not really what you're asking about here.

ioquatix commented 5 years ago

When I saw the comment that ab was buggy with no citation, I wondered what was wrong with it. Yes, it's old. Yes, it's HTTP/1.0. I don't say the other concerns are illegitimate, but that I don't have enough experience and you don't provide enough evidence. What kind of message are you trying to send with such a statement?

I wondered if you'd only tested ab with puma - have the same bugs occurred in some other server? If it's puma that's buggy and not ab, isn't it a bit rough to perpetuate the same story as Phusion with little/no evidence? (they also don't provide any evidence but thanks for the link).

I personally use both ab and wrk in my specs and they have caught different issues at different times, both performance and protocol regressions. So, I have respect for those tools having a place :) I'll leave it up to you whether you want to do anything about it, but unless you start fielding lots of questions about why you didn't use ab (which I can hardly imagine is a problem), I'd suggest just removing that statement.

ioquatix commented 5 years ago

I was just taking a look at the output of ab since it's part of my standard test suite:

Server Software:        
Server Hostname:        127.0.0.1
Server Port:            9294

Document Path:          /
Document Length:        0 bytes

Concurrency Level:      8
Time taken for tests:   0.286 seconds
Complete requests:      1600
Failed requests:        0
Total transferred:      107200 bytes
HTML transferred:       0 bytes
Requests per second:    5598.50 [#/sec] (mean)
Time per request:       1.429 [ms] (mean)
Time per request:       0.179 [ms] (mean, across all concurrent requests)
Transfer rate:          366.31 [Kbytes/sec] received

It does seem to include sub-ms precision.

noahgibbs commented 5 years ago

It includes sub-ms precision for the mean across all requests - that is, it gives one sub-ms precision measurement across all requests, which doesn't allow for checking other measurements (percentiles, variance, etc.)

It's possible to have it output the timing for more than a single by specifying one of its two output formats (CSV, GNUplot.) However, those output formats are both only ms-accurate.

It would be possible to run ab once for each request, but then I lose the low overhead which was the reason to not just use RestClient in Ruby.

noahgibbs / rsb

What are the issues with `ab`? #1