joyent / libuv

Go to
https://github.com/libuv/libuv
3.27k stars 654 forks source link

unix: don't uv__write() when stream handle is made pending with uv__io_feed() #746

Closed bnoordhuis closed 9 years ago

bnoordhuis commented 11 years ago

It's a rather subtle issue but I think this is what's happening:

  1. epoll_wait() wakes up with the fd marked ready for writing (EPOLLOUT).
  2. uv__write() writes one or more pending reqs until write(fd, buf, len) returns a value < len (meaning the send buffer is full and the next write will fail with EAGAIN)
  3. It marks the handle pending with uv__io_feed().
  4. The event loop cycles and calls uv__stream_io() with events=UV__POLLOUT and the handle from the previous step.
  5. uv__stream_io() calls uv__write() which then immediately hits EAGAIN when it attempts to write.

tl;dr uv__write() gets called when it shouldn't.

bnoordhuis commented 11 years ago

Turns out the extraneous calls to uv__write() are pretty simple to eliminate but cause regressions on some benchmarks.

The patch:

diff --git a/src/unix/core.c b/src/unix/core.c
index 9b471ec..232d3bc 100644
--- a/src/unix/core.c
+++ b/src/unix/core.c
@@ -575,7 +575,7 @@ static void uv__run_pending(uv_loop_t* loop) {
     ngx_queue_init(q);

     w = ngx_queue_data(q, uv__io_t, pending_queue);
-    w->cb(loop, w, UV__POLLOUT);
+    w->cb(loop, w, 0);
   }
 }

diff --git a/src/unix/stream.c b/src/unix/stream.c
index 8137d24..603ff0c 100644
--- a/src/unix/stream.c
+++ b/src/unix/stream.c
@@ -1119,6 +1119,11 @@ static void uv__stream_io(uv_loop_t* loop, uv__io_t* w, unsigned int events) {
     return;
   }

+  if (events == 0) {
+    uv__write_callbacks(stream);
+    return;
+  }
+
   if (events & (UV__POLLIN | UV__POLLERR | UV__POLLHUP)) {
     assert(uv__stream_fd(stream) >= 0);

diff --git a/src/unix/udp.c b/src/unix/udp.c
index e1a12f2..13525e8 100644
--- a/src/unix/udp.c
+++ b/src/unix/udp.c
@@ -176,6 +176,15 @@ static void uv__udp_run_completed(uv_udp_t* handle) {

 static void uv__udp_io(uv_loop_t* loop, uv__io_t* w, unsigned int revents) {
+  uv_udp_t* handle;
+
+  if (revents == 0) {
+    handle = container_of(w, uv_udp_t, io_watcher);
+    assert(handle->type == UV_UDP);
+    uv__udp_run_completed(handle);
+    return;
+  }
+
   if (revents & UV__POLLIN)
     uv__udp_recvmsg(loop, w, revents);

It's a net win for TCP sockets (I'm seeing a consistent 5% throughput increase on localhost and LAN traffic) but it regresses node's net/tcp-raw-pipe benchmark something awful:

# before
$ out/Release/node benchmark/net/tcp-raw-pipe.js len=16777216 type=buf dur=5
net/tcp-raw-pipe.js len=16777216 type=buf dur=5: 18.557

# after
$ out/Release/node benchmark/net/tcp-raw-pipe.js len=16777216 type=buf dur=5
net/tcp-raw-pipe.js len=16777216 type=buf dur=5: 0.79960

In that benchmark, all those extra calls to uv__write() actually improve throughput because the reader drains the pipe almost immediately after the writer filled it. To wit:

# before
$ strace -ce epoll_wait,write out/Release/node benchmark/net/tcp-raw-pipe.js len=16777216 type=buf dur=5
net/tcp-raw-pipe.js len=16777216 type=buf dur=5: 4.7622
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.077333           3     25484       104 write
  0.00    0.000000           0         6         4 stat
  0.00    0.000000           0       777           epoll_wait
------ ----------- ----------- --------- --------- ----------------
100.00    0.077333                 26267       108 total

# after
$ strace -ce epoll_wait,write out/Release/node benchmark/net/tcp-raw-pipe.js len=16777216 type=buf dur=5
net/tcp-raw-pipe.js len=16777216 type=buf dur=5: 0.50555
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.98    0.092349          22      4210           write
  0.02    0.000020           0      2583           epoll_wait
  0.00    0.000000           0         6         4 stat
------ ----------- ----------- --------- --------- ----------------
100.00    0.092369                  6799         4 total

(No idea why stat shows up in the summary. Probably a bug in strace.)