nodejs / node-v0.x-archive

Moved to https://github.com/nodejs/node
34.44k stars 7.3k forks source link

connect over UDS hangs indefinitely #4785

Closed davepacheco closed 11 years ago

davepacheco commented 11 years ago

In production, I found a Node v0.8.14 program hung trying to connect over a Unix domain socket. According to the kernel state, the socket was connected, but the Node program was not polling on it.

I was able to reproduce this with the following HTTP server:

var mod_http = require('http');

mod_http.createServer(function (request, response) {
        response.writeHead(200);
  response.end();
}).listen('/var/run/.bug.sock', function () {
        console.log('server listening');
});

and the following HTTP client:

var mod_http = require('http');

var last_checked = undefined;
var last = undefined;
var aborted = false;
var timer;

function go()
{
        if (aborted)
                return;

        var req = mod_http.get({
            'path': '/my/path',
            'socketPath': '/var/run/.bug.sock'
        });

        req.on('response', function (response) {
                last = Date.now();
                response.on('end', go);
        });
}

function tick()
{
        if (last_checked !== undefined && last_checked == last) {
                console.error('last_checked: ' + last_checked);
                aborted = true;
                process.abort();
        }

        last_checked = last;
}

timer = setInterval(tick, 5000);
go();

The client makes requests to the server over a Unix Domain Socket in a continuous loop. There's an interval timer to detect the hang by noticing when the program has failed to complete any requests within 5 seconds. Since the HTTP server is local and services requests immediately, we should never see a request take anywhere near 5 seconds to complete, and the client should run indefinitely.

On a SmartOS server, I ran the HTTP server and one copy of the HTTP client, and the programs ran for tens of minutes without crashing. Then I started 5 more clients, and within a minute all six of them had dumped core.

I used MDB to examine the core file to see if I could find the pending request object. This core file has 643 requests, but only two of them have either no "response" object or have not emitted "end" on that object:

[root@0fbd8642-c2e5-4f3a-bba2-fd31cbe2e243 ~]# mdb core
Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]
> ::load v8
V8 version: 3.11.10.22
Autoconfigured V8 support from target
C++ symbol demangling enabled

> ::findjsobjects -p _header | ::findjsobjects ! wc -l
643
> ::findjsobjects -p _header | ::findjsobjects | ::jsprint res._endEmitted ! wc -l
mdb: 'res._endEmitted' not found in b5f87bd1
mdb: 'res._endEmitted' not found in b0f368e1
641

The second of these looks garbage-collected, but the first one, b5f87bd1, appears to be our hung request:

> b5f87bd1::jsprint -a                
b5f87bd1: {
    parser: b0f35341: {
        maxHeaderPairs: fa0: 2000,
        socket: b5f87dd5: [...],
        _headers: b5f883b9: [...],
        incoming: 9aa08081: null,
        _url: 8a8082e5: "",
        onIncoming: b0f306d1: function parserOnIncomingClient,
    },
    outputEncodings: b5f87c31: [
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
    ],
    _trailer: 8a8082e5: "",
    path: 8a82ecd1: "/my/path",
    finished: 9aa080a1: true,
    socket: b5f87dd5: {
        destroyed: 9aa080b1: false,
        parser: b0f35341: [...],
        errorEmitted: 9aa080b1: false,
        allowHalfOpen: 9aa08091: undefined,
        _httpMessage: b5f87bd1: [...],
        bytesRead: 0: 0,
        _pendingWriteReqs: 0: 0,
        _connecting: 9aa080a1: true,
        writable: 9aa080a1: true,
        _bytesDispatched: 0: 0,
        _connectQueue: b5f889bd: [...],
        _handle: b5f87e6d: [...],
        _flags: 0: 0,
        _connectQueueSize: 7a: 61,
        _events: b5f886a5: [...],
    },
    method: 8a83078d: "GET",
    _headers: b5f87cd1: {
        host: 8a830761: "localhost",
    },
    shouldKeepAlive: 9aa080b1: false,
    _last: 9aa080a1: true,
    writable: 9aa080a1: true,
    connection: b5f87dd5: {
        destroyed: 9aa080b1: false,
        parser: b0f35341: [...],
        errorEmitted: 9aa080b1: false,
        allowHalfOpen: 9aa08091: undefined,
        _httpMessage: b5f87bd1: [...],
        bytesRead: 0: 0,
        _pendingWriteReqs: 0: 0,
        _connecting: 9aa080a1: true,
        writable: 9aa080a1: true,
        _bytesDispatched: 0: 0,
        _connectQueue: b5f889bd: [...],
        _handle: b5f87e6d: [...],
        _flags: 0: 0,
        _connectQueueSize: 7a: 61,
        _events: b5f886a5: [...],
    },
    _headerNames: b5f87cdd: {
        host: 8a83079d: "Host",
    },
    _headerSent: 9aa080a1: true,
    sendDate: 9aa080b1: false,
    socketPath: 8a82ecfd: "/var/run/.bug.sock",
    _header: b5f88189: "GET /my/path HTTP/1.1
Host: localhost
Connection: close                     

",
    _hasBody: 9aa080a1: true,
    chunkedEncoding: 9aa080b1: false,
    agent: b0f307b5: {
        sockets: b0f34865: [...],
        requests: b0f34859: [...],
        options: b0f3484d: [...],
        maxSockets: a: 5,
        _events: b0f34871: [...],
    },
    useChunkedEncodingByDefault: 9aa080b1: false,
    output: b5f87c21: [
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
        9aa080c1: hole,
    ],
    _events: b5f87fd9: {
        response: b5f8826d: function <anonymous> (as <anon>),
    },
}

Notice that the "connecting" field on the socket is "true". Using the "handle" pointer in the socket, with a little work, we can get to the underlying PipeWrap object. Node stores this in the first internal field of the object, which is just after the "elements" field of the JSObject:

> b5f87e6d::v8print
b5f87e6d JSObject {
    b5f87e6d JSReceiver {
        b5f87e6d HeapObject < Object  {
            b5f87e6c map = a15152a1 (Map)
        }
    }
    b5f87e70 properties = b5f87e7d (FixedArray)
    b5f87e74 elements = 8a808081 (FixedArray)
}
> b5f87e78/p
0xb5f87e78:     0x88f7d30       

(Obviously, we should add an MDB dcmd to make it easier to go directly from the JS handle object to the underlying C++ PipeWrap object.) I don't have CTF (debug information) for Node, so we can't print the PipeWrap object easily, but it's not hard to find the libuv handle inside it by dumping the memory:

> 0x88f7d30,0t16/nap
0x88f7d30:      
0x88f7d30:      vtable for node::PipeWrap+8
0x88f7d34:      0x88350f8       
0x88f7d38:      0x8813c58       
0x88f7d3c:      0x887ce50       
0x88f7d40:      0x88f7d50       
0x88f7d44:      0               
0x88f7d48:      0               
0x88f7d4c:      0x88f7d50       
0x88f7d50:      node`default_loop_struct
...

Thanks to @rmustacc's help, I was able to load CTF for libuv, allowing us to see the uv_pipe_t:

> 0x88f7d50::print -t uv_pipe_t
uv_pipe_t {
    uv_loop_t *loop = node`default_loop_struct
    uv_close_cb close_cb = 0
    void *data = 0x88f7d30
    uv_handle_type type = 6 (UV_NAMED_PIPE)
    ngx_queue_t handle_queue = {
        ngx_queue_t *prev = 0x8813c78
        ngx_queue_t *next = 0x887ce78
    }
    int flags = 0x2000
    uv_handle_t *next_closing = 0
    size_t write_queue_size = 0
    uv_alloc_cb alloc_cb = 0
    uv_read_cb read_cb = 0
    uv_read2_cb read2_cb = 0
    uv_connect_t *connect_req = 0x8810f38
    uv_shutdown_t *shutdown_req = 0
    uv__io_t read_watcher = {
        ev_io io_watcher = {
            int active = 0
            int pending = 0
            int priority = 0
            void *data = uv__stream_io
            void (*)() cb = uv__io_rw
            struct ev_watcher_list *next = 0
            int fd = 0xffffffff
            int events = 0x80
        }
    }
    uv__io_t write_watcher = {
        ev_io io_watcher = {
            int active = 0
            int pending = 0
            int priority = 0
            void *data = uv__stream_io
            void (*)() cb = uv__io_rw
            struct ev_watcher_list *next = 0x88f7d88
            int fd = 0xffffffff
            int events = 0x80
        }
    }
    ngx_queue_t write_queue = {
        ngx_queue_t *prev = 0x88f7dc8
        ngx_queue_t *next = 0x88f7dc8
    }
    ngx_queue_t write_completed_queue = {
        ngx_queue_t *prev = 0x88f7dd0
        ngx_queue_t *next = 0x88f7dd0
    }
    uv_connection_cb connection_cb = 0
    int delayed_error = 0
    int accepted_fd = 0xffffffff
    int fd = 0x9
    const char *pipe_fname = 0
    int ipc = 0
}

The important pieces are:

With the watchers uninitialized, it looks like either the program never called uvstream_open, which calls uv__io_set to initialize the watchers, or something uninitialized them afterwards. Not finding any code path that would uninitialize them, I looked for a case where uvstream_open is not called. I guessed that this might happen if the program saw EINPROGRESS from connect(2), and I ran this DTrace script while reproducing the problem to test that hypothesis:

#!/usr/sbin/dtrace -s

#pragma D option quiet

syscall::connect:entry
/pid == $1/
{
        self->fd = arg0;
        self->c = arg1;
}

syscall::connect:return
/self->c/
{
        self->s = (struct sockaddr_un *)copyin(
            self->c, sizeof (struct sockaddr_un));
        printf("%2d  connect \"%s\" => %d (%d)\n", self->fd,
            self->s->sun_path, arg1, errno);
        self->s = 0;
        self->c = 0;
        self->fd = 0;
}

Sure enough, the output looks like this:

[root@headnode (bh1-kvm11:0) ~]# /var/tmp/connect.d 53325
...
 8  connect "/var/run/.bug.sock" => 0 (0)
 9  connect "/var/run/.bug.sock" => 0 (0)
 8  connect "/var/run/.bug.sock" => 0 (0)
 9  connect "/var/run/.bug.sock" => 0 (0)
 8  connect "/var/run/.bug.sock" => -1 (150)

and on this system errno 150 is:

[root@headnode (bh1-kvm11:0) /var/tmp]# grep -w 150 /usr/include/sys/errno.h 
#define EINPROGRESS     150     /* operation now in progress */

So we saw thousands of successful connects while the program ran, followed by one that returns -1 with EINPROGRESS, then the process exited.

Looking at the code, if the connect() call in uv_pipe_connect() returns -1 with EINPROGRESS, the uv__stream_open() call is skipped, and the program returns back to the caller in pipe_wrap.c. As far as I can tell, that caller never adds the new socket to the poll set, so the program never finds out that the UDS has connected. That explains why when I saw this initially, the Node program wasn't polling on the socket, and the program hung indefinitely.

Finally, there's a comment in the code addressing this situation:

189   /* We don't check for EINPROGRESS. Think about it: the socket
190    * is either there or not.
191    */

If I understand correctly, the comment is justifying not checking for EINPROGRESS by suggesting that connect() on a UDS cannot return that error code. But I cannot find that claim documented anywhere in POSIX or man pages, and obviously illumos-based systems do return EINPROGRESS on UDS sockets in some situations. (Even if that claim were true, an assertion would seem more appropriate here so that the failure mode could be crisp if a system were ever malfunctioning. But as far as I can tell, the kernel is well within its rights to return EINPROGRESS here.)

The fix is that uv_pipe_connect() needs to handle EINPROGRESS from connect(2) appropriately.

c4milo commented 11 years ago

you might as well provide a patch :dart: hehe

bnoordhuis commented 11 years ago

Dave, can you try this patch?

diff --git a/src/unix/pipe.c b/src/unix/pipe.c
index b28c8ef..1185b91 100644
--- a/src/unix/pipe.c
+++ b/src/unix/pipe.c
@@ -183,9 +183,6 @@ void uv_pipe_connect(uv_connect_t* req,
   uv_strlcpy(saddr.sun_path, name, sizeof(saddr.sun_path));
   saddr.sun_family = AF_UNIX;

-  /* We don't check for EINPROGRESS. Think about it: the socket
-   * is either there or not.
-   */
   do {
     r = connect(uv__stream_fd(handle),
                 (struct sockaddr*)&saddr, sizeof saddr);
@@ -193,7 +190,8 @@ void uv_pipe_connect(uv_connect_t* req,
   while (r == -1 && errno == EINTR);

   if (r == -1)
-    goto out;
+    if (errno != EINPROGRESS)
+      goto out;

   if (new_sock)
     if (uv__stream_open((uv_stream_t*)handle,
@@ -213,8 +211,9 @@ out:
   req->cb = cb;
   ngx_queue_init(&req->queue);

-  /* Run callback on next tick. */
-  uv__io_feed(handle->loop, &handle->io_watcher);
+  /* Force callback to run on next tick in case of error. */
+  if (err != 0)
+    uv__io_feed(handle->loop, &handle->io_watcher);

   /* Mimic the Windows pipe implementation, always
    * return 0 and let the callback handle errors.
davepacheco commented 11 years ago

Perfect: I applied the patch to node master, and I can't reproduce the problem. Thanks!

bnoordhuis commented 11 years ago

Thanks for testing, Dave. Landed in v0.8 and master.