nodejs / node-v0.x-archive

Moved to https://github.com/nodejs/node
34.44k stars 7.3k forks source link

net: `read ECONNRESET` when doing a big write #5542

Closed isaacs closed 11 years ago

isaacs commented 11 years ago

While investigating #5504, I wrote this test script: https://gist.github.com/isaacs/5640690

It's very strange to me that if you write something big, it seems to trigger an ECONNRESET on the read side.

I don't see why this should be. It's as if the client is rejecting and closing the socket, but I'm not doing that, so I think this is probably just a bug.

cc: @bnoordhuis

bnoordhuis commented 11 years ago

It is what it is:

accept4(10, 0, NULL, SOCK_CLOEXEC|SOCK_NONBLOCK) = 11                               
getpeername(11, {sa_family=AF_INET, sin_port=htons(60067), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 
write(11, "################################"..., 1048576) = 656640                  
epoll_ctl(5, EPOLL_CTL_ADD, 11, {EPOLLIN|EPOLLOUT, {u32=11, u64=11}}) = 0           
epoll_wait(5, {{EPOLLOUT, {u32=11, u64=11}}}, 1024, 95) = 1                         
write(11, "################################"..., 391936) = 391936                   
epoll_ctl(5, EPOLL_CTL_MOD, 11, {EPOLLIN, {u32=11, u64=11}}) = 0                    
epoll_wait(5, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=11, u64=11}}}, 1024, 95) = 1        
read(11, 0x7fd809632010, 65536)         = -1 ECONNRESET (Connection reset by peer)

And that's because the other end goes away without closing the connection:

socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 10            
connect(10, {sa_family=AF_INET, sin_port=htons(1234), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
epoll_ctl(5, EPOLL_CTL_ADD, 10, {EPOLLOUT, {u32=10, u64=10}}) = 0                   
epoll_wait(5, {{EPOLLOUT, {u32=10, u64=10}}}, 1024, -1) = 1                         
getsockopt(10, SOL_SOCKET, SO_ERROR, [0], [4]) = 0                                  
epoll_ctl(5, EPOLL_CTL_MOD, 10, {EPOLLIN|EPOLLOUT, {u32=10, u64=10}}) = 0           
epoll_wait(5, {{EPOLLIN|EPOLLOUT, {u32=10, u64=10}}}, 1024, -1) = 1                 
read(10, "################################"..., 65536) = 43776                      
epoll_wait(5, {{EPOLLOUT, {u32=10, u64=10}}}, 1024, 0) = 1                          
epoll_ctl(5, EPOLL_CTL_DEL, 10, {EPOLLOUT, {u32=10, u64=10}}) = 0                   
exit_group(0)                           = ?
bnoordhuis commented 11 years ago

That said, I did spot a buglet in libuv:

diff --git a/deps/uv/src/unix/stream.c b/deps/uv/src/unix/stream.c
index fda2f02..1848a09 100644
--- a/deps/uv/src/unix/stream.c
+++ b/deps/uv/src/unix/stream.c
@@ -1283,9 +1283,10 @@ int uv_read2_start(uv_stream_t* stream, uv_alloc_cb alloc_cb,

 int uv_read_stop(uv_stream_t* stream) {
-  uv__io_stop(stream->loop, &stream->io_watcher, UV__POLLIN);
-  uv__handle_stop(stream);
   stream->flags &= ~UV_STREAM_READING;
+  uv__io_stop(stream->loop, &stream->io_watcher, UV__POLLIN);
+  if (!uv__io_active(&stream->io_watcher, UV__POLLOUT))
+    uv__handle_stop(stream);

 #if defined(__APPLE__)
   /* Notify select() thread about state change */

But that doesn't fix the test, it only stops the event loop from exiting prematurely when we're no longer reading but still want to write. I don't think node.js ever does that and it only matters when it's the last handle.

isaacs commented 11 years ago

Ah, the problem is that the client socket isn't keeping the event loop open, so that's why the connection is dropping off without being shut down.

Here's a simpler test: https://gist.github.com/isaacs/5644453

isaacs commented 11 years ago

Output:

NET: 9305 listen2 0.0.0.0 12346 4 false
NET: 9305 _listen2: create a handle
NET: 9305 bind to 0.0.0.0
SERVER 9305: listening
starting client
CLIENT 9306: client
NET: 9306 connect: missing host
NET: 9305 onconnection
NET: 9306 afterConnect
NET: 9305 _read
NET: 9305 Socket._read readStart
NET: 9305 SERVER _emitCloseIfDrained
NET: 9305 SERVER handle? false   connections? 1
SERVER 9305: got connection
CLIENT 9306: got connection
CLIENT 9306: client has ref
NET: 9306 _read
NET: 9306 Socket._read readStart
NET: 9306 onread undefined 0 65536 65536
NET: 9306 got data
NET: 9306 readStop
CLIENT 9306: started read? false
CLIENT 9306: finished read? false

assert.js:92
  throw new assert.AssertionError({
        ^
AssertionError: false == true
    at process.<anonymous> (/Users/isaacs/dev/js/node-v0.10/test/simple/test-net-client-ref-bug.js:57:5)
    at process.EventEmitter.emit (events.js:117:20)
NET: 9305 onread ECONNRESET undefined undefined NaN
NET: 9305 error ECONNRESET
NET: 9305 destroy
NET: 9305 close
NET: 9305 close handle
NET: 9305 has server
NET: 9305 SERVER _emitCloseIfDrained

events.js:72
        throw er; // Unhandled 'error' event
              ^

Error: read ECONNRESET
    at errnoException (net.js:884:11)
    at TCP.onread (net.js:539:19)
assert.js:92
  throw new assert.AssertionError({
        ^
AssertionError: false == true
    at ChildProcess.<anonymous> (/Users/isaacs/dev/js/node-v0.10/test/simple/test-net-client-ref-bug.js:116:7)
    at ChildProcess.EventEmitter.emit (events.js:98:17)
    at Process.ChildProcess._handle.onexit (child_process.js:789:12)

Buglet fix has no effect, as predicted.

isaacs commented 11 years ago

Same behavior as far back as 0.10.0, so I'm not sure when it changed.

isaacs commented 11 years ago

So, the issue here is that the client isn't in the middle of a read or a write or a connect, so there's no req's keeping the socket open. This has nothing to do with the original bug. False alarm.

jdalton commented 11 years ago

I've starting to seeing an assert similar to this now with my build script node build modern, or anything really, now causes

Assertion failed: (!uvio_active(&stream->io_watcher, UVPOLLOUT) || !ngx_queue_empty(&stream->write_completed_queue) || !ngx_queue_empty(&stream->write_queue) || stream->shutdown_req != NULL || stream->connect_req != NULL), function uv_read_stop, file ../deps/uv/src/unix/stream.c, line 1319. Abort trap: 6

This doesn't happen in v0.10.6 but does in v0.10.8 :/

tjfontaine commented 11 years ago

@jdalton that's fixed in joyent/libuv@b38c9c1 and will be in the next 0.10 release

jdalton commented 11 years ago

Thanks @tjfontaine, is there an issue that commit corresponds to?

tjfontaine commented 11 years ago

@jdalton not exactly, but there is #5562

19h commented 11 years ago

Our servers logged thousands of crashes of the instances with jdaltons assertion-error. This quite crapped our production.. thank god it's fixed.

springmeyer commented 11 years ago

I'm still seeing a simliar assert as mentioned above - commented at the ticket @tjfontaine referenced: https://github.com/joyent/node/issues/5562#issuecomment-18716889

somnathwy commented 10 years ago

Receiving same issue net: read ECONNRESET while posting big data over https node version 10.22