wg / wrk

Modern HTTP benchmarking tool
Other
38.12k stars 2.95k forks source link

scheduler bugs with large POST payloads result in >200x req/s reduction #41

Open tarasglek opened 11 years ago

tarasglek commented 11 years ago

large payloads highlight bugs in ae scheduler.

Before my patch:

./wrk -d 3 http://192.168.1.149 -M POST --body  `tr '\0' '0' < /dev/zero| head -c 10000`
Running 3s test @ http://192.168.1.149
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.00us    0.00us   0.00us  100.00%
    Req/Sec     0.00      0.00     0.00    100.00%
  10 requests in 4.00s, 3.20KB read
  Socket errors: connect 0, read 0, write 0, timeout 10
  Non-2xx or 3xx responses: 10
Requests/sec:      2.50
Transfer/sec:     819.93B

After the minor change to scheduler flags:

./wrk -d 3 http://192.168.1.149 -M POST --body  `tr '\0' '0' < /dev/zero| head -c 10000`
strlen(body)=10000
req.size=10063
Running 3s test @ http://192.168.1.149
  2 threads and 10 connections
wrote 8688/10063
wrote 8688/10063
wrote 8688/10063
wrote 8688/10063
wrote 8688/10063
wrote 8688/10063
wrote 8688/10063
wrote 8688/10063
wrote 8688/10063
wrote 8688/10063
wrote 1375/10063
wrote 7240/10063
wrote 2823/10063
wrote 1375/10063
wrote 1375/10063
wrote 1375/10063
wrote 1375/10063
wrote 1375/10063
wrote 1375/10063
wrote 1375/10063
wrote 1375/10063
wrote 1375/10063
wrote 8688/10063
wrote 1375/10063
wrote 8688/10063
wrote 8688/10063
wrote 1375/10063
wrote 1375/10063
wrote 8688/10063
wrote 1375/10063
wrote 8688/10063
wrote 1375/10063
wrote 8688/10063
wrote 1375/10063
wrote 8688/10063
wrote 8688/10063
wrote 1375/10063
wrote 1375/10063
wrote 7240/10063
wrote 2823/10063
wrote 8688/10063
wrote 1375/10063
wrote 8688/10063
wrote 1375/10063
wrote 8688/10063
wrote 1375/10063
wrote 8688/10063
wrote 1375/10063
wrote 8688/10063
wrote 1375/10063
wrote 8688/10063
wrote 1375/10063
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     8.30ms    4.87ms  13.23ms   27.84%
    Req/Sec   175.04    174.42   555.00     36.63%
  1101 requests in 4.00s, 352.62KB read
  Socket errors: connect 0, read 0, write 0, timeout 5
  Non-2xx or 3xx responses: 1101
Requests/sec:    275.29
Transfer/sec:     88.17KB

The following patch reveals that the bugs occur when we can't write out a full buffer to a socket..It also 'fixes' the throughput for the 10K post payload by using AE_DONT_WAIT.

However the bug is still there when using 60K post payloads. Note the server is just nginx with default debian configuration.

--- a/src/wrk.c
+++ b/src/wrk.c
@@ -131,7 +131,7 @@ int main(int argc, char **argv) {
     cfg.addr = *addr;
     req.buf  = format_request(host, port, path, headers);
     req.size = strlen(req.buf);
-
+    printf("req.size=%lu\n", req.size);
     pthread_mutex_init(&statistics.mutex, NULL);
     statistics.latency  = stats_alloc(SAMPLES);
     statistics.requests = stats_alloc(SAMPLES);
@@ -415,7 +415,9 @@ static void socket_writeable(aeEventLoop *loop, int fd, void *data, int mask) {
     }

     if (!c->written) c->start = time_us();
-
+    if (n != req.size) {
+      printf("wrote %lu/%lu\n", n, req.size);
+    }
     c->written += n;
     if (c->written == req.size) {
         c->written = 0;
@@ -541,6 +543,7 @@ static int parse_args(struct config *cfg, char **url, char **headers, int argc,
                 break;
             case 'B':
                 req.body = optarg;
+                fprintf(stderr, "strlen(body)=%lu\n", strlen(req.body));
                 break;
             case 'L':
                 cfg->latency = true;
wg commented 11 years ago

Hi @tarasglek, thank you, that's very interesting! I haven't really done any benchmarks using a post body. I don't see any fixes in the patch though, just debug output?

tarasglek commented 11 years ago

@wg, Accidentally omitted

diff --git a/src/ae.c b/src/ae.c
index 90be4e2..4b8e017 100644
--- a/src/ae.c
+++ b/src/ae.c
@@ -422,7 +422,7 @@ void aeMain(aeEventLoop *eventLoop) {
     while (!eventLoop->stop) {
         if (eventLoop->beforesleep != NULL)
             eventLoop->beforesleep(eventLoop);
-        aeProcessEvents(eventLoop, AE_ALL_EVENTS);
+        aeProcessEvents(eventLoop, AE_ALL_EVENTS | AE_DONT_WAIT);
     }
 }
wg commented 11 years ago

Thanks! I played with this change a bit on EC2 but didn't see the same results. In most scenarios using AE_DONT_WAIT actually reduced performance. What does your test environment look like as far as hardware, OS, network, etc?

tarasglek commented 11 years ago

I was using an idle gigE connection between 2 ivybridge machines. If I remember correctly apachebench gets about 3600req/s(fills the gigE pipe) in this setup.