lpereira / lwan

Experimental, scalable, high performance HTTP server
https://lwan.ws
GNU General Public License v2.0
5.92k stars 549 forks source link

Slow down if the request method is PUT #302

Closed pontscho closed 3 years ago

pontscho commented 3 years ago

Hey, thanks for this excellent work with lwan, I really like this web server, I'm very satisfied with both of the performance and the lua support too.

I've started to building a small portal based on lwan + lua and I've added the PUT support for receiving data and I want to use PUT instead of POST. So, in a first step I've added the PUT, like this:

@@ -221,8 +221,8 @@ enum lwan_handler_flags {
 /* 1<<0 set: response has body; see has_response_body() in lwan-response.c */
 #define FOR_EACH_REQUEST_METHOD(X)                                             \
     X(GET, get, (1 << 0), STR4_INT('G', 'E', 'T', ' '))                        \
    X(POST, post, (1 << 1 | 1 << 0), STR4_INT('P', 'O', 'S', 'T'))             \
     X(HEAD, head, (1 << 1), STR4_INT('H', 'E', 'A', 'D'))                      \
     X(OPTIONS, options, (1 << 2), STR4_INT('O', 'P', 'T', 'I'))                \
-     X(DELETE, delete, (1 << 1 | 1 << 2), STR4_INT('D', 'E', 'L', 'E'))
+     X(DELETE, delete, (1 << 1 | 1 << 2), STR4_INT('D', 'E', 'L', 'E'))         \
+     X(PUT, put, (1 << 2 | 1 << 0), STR4_INT('P', 'U', 'T', ' '))

And I also had to add this line to the lwan-thread.c:process_request_coro:120 (after the while(true)): buffer.len = 0; for cleaning up the mess after the PUT was successful.

Everything works well as I expected, except one thing, the reply time is increasing dramatically if the PUT request has body part. For example:

1) request rtt with zero length body part:

$ wrk -c 100 -d 10 -t 6 --latency --timeout 3s http://<ip>/put -s wrk.lua -d1s
Running 1s test @ http://<ip>/put
  6 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.67ms    0.85ms  11.11ms   71.16%
    Req/Sec     3.42k   154.80     3.87k    80.30%
  Latency Distribution
     50%    4.58ms
     75%    5.17ms
     90%    5.75ms
     99%    7.06ms
  22440 requests in 1.10s, 7.30MB read

2) If the request has just one byte in the body part:

Niobe:~ pontscho$ wrk -c 100 -d 10 -t 6 --latency --timeout 3s http://<ip>/put -s wrk.lua -d1s
Running 1s test @ http://<ip>/put
  6 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   147.79ms   95.53ms 219.89ms   69.23%
    Req/Sec   119.18     89.94   316.00     81.82%
  Latency Distribution
     50%  210.93ms
     75%  211.95ms
     90%  212.33ms
     99%  216.00ms
  624 requests in 1.10s, 208.41KB read

I've put some profiler code to the thread and the request processor, but I found nothing, everything is finished under maximum 5 milliseconds.

I used an old mac mini for development server, it has 2 cpu, I ran the lwan with 6 threads, and I saw that, if I set the keep alive timeout to 5s and I ran five queries on a single path, I got this TTFB times: 1: 9ms 2: 7ms 3: 213ms 4: 220ms <waiting 5 seconds> 5: 8ms ...

Currently I've ran out of ideas what can I need to check for fixing this timing issue and I would be very grateful if you can give me some advice what would be a problem, what could I check.

Thanks.

lpereira commented 3 years ago

You probably need to patch prepare_for_response() so the request body will be read. Otherwise, the body will be used as part of the next request, throwing off the parser depending on what was sent there. This indeed needs more work to make it easier to add more verbs. Once you do this, you can remove the buffer.len = 0 line, as that will break pipelining. (If things don't work after this, then we'd have to debug this further.)

I would appreciate a patch once you're done with this, BTW!

pontscho commented 3 years ago

Thanks, I will check it. After I wrote that message above I also realised that after some other tests the buffer.len=0 line didn't solve the response buffer cleaning.

pontscho commented 3 years ago

Yeah, I planned to send a patch if I finished with this.

pontscho commented 3 years ago
$ wrk -c 100 -d 10 -t 6 --latency --timeout 3s http://<ip>/put -s wrk.lua -d1s
Running 1s test @ http://<ip>/put
  6 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.74ms    0.90ms   9.62ms   74.04%
    Req/Sec     3.36k    80.94     3.54k    69.23%
  Latency Distribution
     50%    4.58ms
     75%    5.18ms
     90%    5.95ms
     99%    7.55ms
  21779 requests in 1.09s, 3.86MB read
Requests/sec:  20021.70
Transfer/sec:      3.55MB

It's just a wireframe, but much-much better. Thanks. I want to add the "Expect: 100-continue" situation to the PUT handler and I will clarify the patch.

lpereira commented 3 years ago

Glad it was just that! I was worried it could be something more serious. :)