Request hangs on content >4096 bytes

The server hangs when an incoming HTTP request is > 4096 bytes and all of those bytes are read in the first read syscall.

Reproduction: $ docker run --rm -d -p 127.0.0.1:7379:7379 nicolas/webdis $ cat 4096.txt | nc localhost 7379 -v Observe that the netcat returns 200 OK. $ cat 4097.txt | nc localhost 7379 -v Observe that the server hangs.

4096.txt contains:

PUT /set/key HTTP/1.0
Connection: close
Content-Length: 4022
Accept: */*

[4022 x characters]

4097.txt contains the same thing except with 4023 in the Content-Length and one more x.

It seems like #145 fixes this issue, and it should have been merged back in 2017. If the data is only slightly larger than 4096 bytes and also comes in too fast, the read syscall fills up the buffer and then has a bit more to go, but no EV_READ event will be emitted after that, so the server hangs indefinitely.

Hi @likuilin,

Thanks for the detailed report and test cases. I'll go over this PR to fix the conflict and see if this addresses the issue.

I'll follow up shortly.

I manually merged it (don't pull this commit, it doesn't attribute the original author) over at https://github.com/likuilin/webdis and for me, this fixed the issue under the same reproduction steps.

Interestingly, if I just directly curl large files at webdis, it doesn't trigger the bug. I'm running nginx over webdis, though, and since nginx buffers requests, that triggers it more often.

It might be OS-dependent how much each read reads, so I hope you can reproduce it. My host OS is Ubuntu 20.04 on AWS EC2.

Interesting.

Although the issue is easily reproducible with Docker, I couldn't reproduce it on macOS (Big Sur, 20C69) with the latest version of the code.

The call to read is in http_client_read, which is called by worker_can_read, itself installed with event_set(&c->ev, c->fd, EV_READ, worker_can_read, c);.

Webdis does have code to handle this case, as at the end of worker_can_read the state of the client is checked and the event handler is reinstalled to get called again when more data is available:

if(c->broken) { /* terminate client */
    http_client_free(c);
} else {
    /* start monitoring input again */
    worker_monitor_input(c);
}

I added some debug logs and I see two successive calls to read() returning 4,096 and 1 bytes respectively with the 4097.txt test file:

in worker_can_read, before http_client_read
in http_client_can_read, read 4096 bytes
in worker_can_read, http_client_read returned 4096
in worker_can_read, c->broken=0, calling worker_monitor_input
in worker_can_read, before http_client_read
in http_client_can_read, read 1 byte
in worker_can_read, http_client_read returned 1
/set/key
in worker_can_read, c->broken=1, calling http_client_free

This is the expected behavior, and seen here working as intended: worker_can_read is triggered by the arrival of data, we read 4,096 bytes, call worker_monitor_input to re-register the event, get called again for the 1 extra byte, run the command, and close the socket.

Side note: the EV_READ event is not registered with EV_PERSIST since that makes it more difficult to have fine-grained control over the arrival of data.

Clearly the version running in Docker is not behaving the same way. I tried building the latest version of the code in Docker using the same image as what's published, and it doesn't reproduce the issue either.

I tracked it down to the change in 6556039e05ed64f1331571306af17a49fb1b2328, which fixed #194. It made it so that webdis only considers the request as fully read once the HTTP parser has called http_client_on_message_complete. This commit also added a test (tests/curl-tests.sh) that uploads a 10,0000 byte value to a key, and then reads it back. This is similar to the one you provided, and produces this debug output:

in worker_can_read, before http_client_read
in http_client_can_read, read 201 bytes
in worker_can_read, http_client_read returned 201
in worker_can_read, c->broken=0, calling worker_monitor_input
in worker_can_read, before http_client_read
in http_client_can_read, read 4096 bytes
in worker_can_read, http_client_read returned 4096
in worker_can_read, c->broken=0, calling worker_monitor_input
in worker_can_read, before http_client_read
in http_client_can_read, read 4096 bytes
in worker_can_read, http_client_read returned 4096
in worker_can_read, c->broken=0, calling worker_monitor_input
in worker_can_read, before http_client_read
in http_client_can_read, read 1808 bytes
in worker_can_read, http_client_read returned 1808
/SET/toobig
in worker_can_read, c->broken=1, calling http_client_free

It reads 201 bytes (likely the request method, resource, and headers), then 4,096 + 4,096 + 1,808 = 10,000 bytes for the body, then processes the command and closes the connection.

So I think this is fixed in master, but there's no release yet with this commit.

That said, the commit in #145 still seems valuable, as it reads all the data that's available for the request without requiring multiple callbacks to re-register the event, read again, etc. I built and tested it on Linux (in Docker) and it seemed to work fine; I can see the multiple reads happening one after the other followed by a -1 and errno set to EWOULDBLOCK. Unfortunately it's pretty broken on macOS. The simplest repro is just to run curl with a PING command:

$ curl -v -m 2 http://127.0.0.1:7379/PING
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 7379 (#0)
> GET /PING HTTP/1.1
> Host: 127.0.0.1:7379
> User-Agent: curl/7.64.1
> Accept: */*
>
* Operation timed out after 2003 milliseconds with 0 bytes received
* Closing connection 0
curl: (28) Operation timed out after 2003 milliseconds with 0 bytes received

I vaguely remember now trying this patch years ago on macOS and seeing that all requests just hung, and then not investigating much further.

I have a feeling this is due to differences in what's returned in the "would block" case; both platforms define EWOULDBLOCK as EAGAIN, but I suspect that this might not be what's returned on macOS.

I'll continue digging to see if I can find a way to fix this patch so that it works on both platforms.

For now would, you mind trying the latest master without the #145 patch? I'm hoping that the commit I linked to above already fixes it, even if it takes a few more callbacks. Your 4,097-byte file is a good way to test it, but there's also this curl-tests.sh file under tests/ that should do something similar. This would be super helpful.

And regardless of where I end up with #145 it's clear this fix needs to be released, so I'll get on it very soon.

Aha! I think I've got it.

When I run this simple curl command with #145 on macOS, I see a first read of 4,096 bytes, then a second of 1 byte, then read returns -1 but errno is not set to EWOULDBLOCK; instead, it's EINPROGRESS. This EINPROGRESS is actually present in errno even before the call to read, so this doesn't seem to be set as a result of the read potentially blocking. This deserves its own investigation.

We get EWOULDBLOCK on read if the socket is marked as non-blocking, which is done with ioctl(fd, FIONBIO, &one);. There is actually code to do this as soon as we've established a client connection with accept, but it doesn't look quite right (from server.c on lines 131-132):

    /* make non-blocking */
    ioctl(client_fd, (int)FIONBIO, (char *)&on);

The signature for ioctl is:

     int
     ioctl(int fildes, unsigned long request, ...);

There's no error-checking for this call, but adding a log line shows the issue:

-       ioctl(client_fd, (int)FIONBIO, (char *)&on);
+       int status = ioctl(client_fd, (int)FIONBIO, (char*)&on);
+       char log_msg[100];
+       sprintf(log_msg, "ioctl status: %d", status);
+       slog(s, WEBDIS_INFO, log_msg, 0);
+       if (status == -1) {
+               slog(s, WEBDIS_ERROR, strerror(errno), 0);
+       }

Which produces:

I ioctl status: -1
E Device not configured

The second parameter is supposed to be an unsigned long, not an int. For some reason (differences in calling convention?) this succeeds on Linux:

I ioctl status: 0

The parameter was also incorrect:

FIONBIO Sets or clears nonblocking input/output for a socket. The arg parameter is a pointer to an integer. If the integer is 0, nonblocking input/output on the socket is cleared. Otherwise, the socket is set for nonblocking input/output.

This is also indicated by the definition of FIONBIO, which encodes the size of the argument:

#define FIONBIO         _IOW('f', 126, int)     /* set/clear non-blocking i/o */

This seems to be the right fix for the call to ioctl:

--- src/server.c
+++ src/server.c
@@ -119,7 +119,7 @@ server_can_accept(int fd, short event, void *ptr) {
        int client_fd;
        struct sockaddr_in addr;
        socklen_t addr_sz = sizeof(addr);
-       char on = 1;
+       int on = 1;
        (void)event;

        /* select worker to send the client to */
@@ -129,7 +129,13 @@ server_can_accept(int fd, short event, void *ptr) {
        client_fd = accept(fd, (struct sockaddr*)&addr, &addr_sz);

        /* make non-blocking */
-       ioctl(client_fd, (int)FIONBIO, (char *)&on);
+       int status = ioctl(client_fd, FIONBIO, &on);

With it, having this patch applied no longer leads to connections hanging.

I'll continue testing this for a little bit and will cut the 0.1.16 release with all 3 changes:

the fix for #194
the read loop patch from #145
the ioctl fix

Thanks again for nudging me on this issue! This ioctl line was written on June 12, 2011(!). I was using Linux as main desktop OS at the time and clearly it must have already worked in this format at the time. Interestingly even the recent addition of CodeQL in this repo did not flag this issue.

For now would, you mind trying the latest master without the #145 patch? I'm hoping that the commit I linked to above already fixes it, even if it takes a few more callbacks. Your 4,097-byte file is a good way to test it, but there's also this curl-tests.sh file under tests/ that should do something similar. This would be super helpful.

Interesting. I can confirm that currently, on master, without #145, this issue is fixed for me. Both 4097.txt and curl-tests work properly.

Additionally, just to ensure the ioctl fix patch doesn't affect anything on Linux - I tested 0.1.15 with only the ioctl fix patch above, as expected, everything else still works properly, but 4097.txt and curl-tests hang. Also, I tested master with the ioctl fix patch, and everything works properly. Your reasoning is sound, so, looks good to me.

Thank you for the fast responses and work on this!

Awesome, thanks for the tests! Where I ended up yesterday was that the combination of all 3 changes does seem to work, but I wanted to make sure that these commits weren't having an impact on performance.

I configured webdis with 10 threads and started with the tests in ./bench.sh but got relatively inconsistent results across multiple runs (which is an interesting issue in its own right). So I decided to simplify things and use ab instead, with the following command:

$ ab -n 1000000 -c 100 -k http://127.0.0.1:7379/ping

With webdis running on Mac without Docker:

Latest `master`:

Requests per second:    77656.90 [#/sec] (mean)
Time per request:       1.288 [ms] (mean)
Time per request:       0.013 [ms] (mean, across all concurrent requests)
Transfer rate:          27680.44 [Kbytes/sec] received
[...]
Percentage of the requests served within a certain time (ms)
  95%      2
  98%      2
  99%      2
 100%     32 (longest request)

With `ioctl` fix alone:

Requests per second:    85819.33 [#/sec] (mean)
Time per request:       1.165 [ms] (mean)
Time per request:       0.012 [ms] (mean, across all concurrent requests)
Transfer rate:          30589.90 [Kbytes/sec] received
[...]
  95%      2
  98%      2
  99%      2
 100%     25 (longest request)

With PR #145 + `ioctl` fix:

Requests per second:    81559.42 [#/sec] (mean)
Time per request:       1.226 [ms] (mean)
Time per request:       0.012 [ms] (mean, across all concurrent requests)
Transfer rate:          29071.47 [Kbytes/sec] received
[...]
  95%      2
  98%      2
  99%      2
 100%     26 (longest request)

Requests/second:

In Docker on Alpine Linux, with `ab` ran from the Mac host:

It's much slower in Docker but it's the comparison that matters. Due to the slowness I changed the -n 1000000 to -n 100000 (1M to 100k).

Latest `master`:

Requests per second:    1407.77 [#/sec] (mean)
Time per request:       71.034 [ms] (mean)
Time per request:       0.710 [ms] (mean, across all concurrent requests)
Transfer rate:          501.79 [Kbytes/sec] received
[...]
  95%     88
  98%     97
  99%    105
 100%    252 (longest request)

With `ioctl` fix alone:

Requests per second:    1415.12 [#/sec] (mean)
Time per request:       70.665 [ms] (mean)
Time per request:       0.707 [ms] (mean, across all concurrent requests)
Transfer rate:          504.41 [Kbytes/sec] received
[...]
  95%     86
  98%     94
  99%     98
 100%    204 (longest request)

With PR #145 + `ioctl` fix:

Requests per second:    1438.01 [#/sec] (mean)
Time per request:       69.541 [ms] (mean)
Time per request:       0.695 [ms] (mean, across all concurrent requests)
Transfer rate:          512.57 [Kbytes/sec] received
[...]
  95%     80
  98%     92
  99%     97
 100%    199 (longest request)

Requests/second:

Pretty much no difference at all.

I also tried comparing the commits with requests that send more than 4 KB of data, since this is the scenario that's most likely to show a difference.

$ printf 'A%.0s' $(seq 1 10000) > 10k.txt
$ wc -c 10k.txt
   10000 10k.txt
$ ab -c 100 -k -n 10000 -u 10k.txt -T text/plain http://127.0.0.1:7379/SET/10k

Here again, I sent 10x more requests when running on the same host since Docker was so much slower.

Branch	Requests/sec (macOS)	Requests/sec (Docker)
`master`	70,121	1,217
`ioctl`	69,529	1,147
`both`	69,812	1,191

It's hard to tell whether it really makes a difference. At this point since there's no clear benefit to the read loop and since the ioctl fix is needed, I'm going to stick with just the latter.

The issue is confirmed fixed and while the read loop might have some benefits in some cases, I would expect the vast majority of requests to be under 4 KB. So in these cases http_client_read would call read twice when there's no need to, and without a significant performance improvement seen in larger requests I'm not convinced that it's worth including.

Thanks again for all the help, this is much appreciated. I'll release 0.1.16 today.

@likuilin I've released webdis 0.1.16 with the ioctl patch.

$ docker run --rm -d -p 127.0.0.1:7379:7379 nicolas/webdis:0.1.16
[...]
$ cat 4097.txt | nc 127.0.0.1 7379 -v
Connection to 127.0.0.1 port 7379 [tcp/*] succeeded!
HTTP/1.0 200 OK
Server: Webdis
Allow: GET,POST,PUT,OPTIONS
Access-Control-Allow-Methods: GET,POST,PUT,OPTIONS
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: X-Requested-With, Content-Type, Authorization
Content-Type: application/json
ETag: "eb8b656178818cbe141ae8208412982d"
Connection: Close
Content-Length: 19

{"set":[true,"OK"]}

Please close this issue if you can verify that it addresses the problem you reported.

Completely fixed on nicolas/webdis:0.1.16. Thanks for the help!

nicolasff / webdis