uNetworking / uWebSockets.js

μWebSockets for Node.js back-ends :metal:
Apache License 2.0
8.08k stars 573 forks source link

Graceful shutdown + worker threads crashes node process #324

Closed SkeLLLa closed 4 years ago

SkeLLLa commented 4 years ago

Hi, thanks for awesome lib.

I've tried to play around with worker threads example combined with graceful shutdown. And find out that it crashes process on exit.

Config:

Here's code that's mostly taken from examples dir.

const uWS = require('uWebSockets.js');
const { Worker, isMainThread, threadId, parentPort } = require('worker_threads');
const os = require('os');

const port = 9001;

if (isMainThread) {
    const workers = os.cpus().map(() => {
        return new Worker(__filename);
    });
    const cleanup = function cleanup() {
        workers.forEach((worker) => {
            worker.postMessage('cleanup');
        });
    };
    process.on('SIGTERM', cleanup);
    process.on('SIGINT', cleanup);
} else {
    let listenSocket;

    const app = uWS.App({}).listen(port, (token) => {
        listenSocket = token;
        if (token) {
            console.info(`${threadId} listening on ${port}`);
        } else {
            console.error(`${threadId} failed listening on ${port}`);
        }
    });

    parentPort.on('message', (value) => {
        if (value === 'cleanup') {
            shutdown();
            process.exit(0);
        }
    });

    function shutdown() {
        if (listenSocket) {
            uWS.us_listen_socket_close(listenSocket);
            listenSocket = null;
        }
    }
}

It launches without any errors, all threads listening. But when you send SIGINT for example node crashes with following core dump:

uv loop at [0x7f40b93f7ad8] has open handles:
[0x7f40940b9b10] prepare (active)
        Close callback: (nil) 
        Data: 0x7f40940c1ad0 
        (First field): 0x7f40940de240 
[0x7f40940b9b90] check (active)
        Close callback: (nil) 
        Data: 0x7f40940c1ad0 
        (First field): 0x7f40940de240 
[0x7f40940de300] timer (active)
        Close callback: 0x800000000000000 
        Data: 0x7f40940de240 
        (First field): 0x7f40940cce40 
[0x7f40940f14e0] async (active)
        Close callback: 0x1 
        Data: 0x7f40940f1420 
uv loop at [0x7f40b93f7ad8] has 4 open handles in total
node[78601]: ../src/debug_utils.cc:322:void node::CheckedUvLoopClose(uv_loop_t*): Assertion `0 && "uv_loop_close() while having open handles"' failed.
uv loop at [0x7f408bbfdad8] has open handles:
[0x7f407c0bf030] prepare (active)
        Close callback: (nil) 
        Data: 0x7f407c0c1b20 
        (First field): 0x7f407c0ba740 
[0x7f407c0bf0b0] check (active)
        Close callback: (nil) 
        Data: 0x7f407c0c1b20 
        (First field): 0x7f407c0ba740 
[0x7f407c0ba800] timer (active)
        Close callback: 0x800000000000000 
        Data: 0x7f407c0ba740 
uv loop at [0x7f408b3fbad8] has open handles:
[0x7f40740bb850] prepare (active)
        Close callback: (nil) 
        Data: 0x7f40740a2950 
        (First field): 0x7f40740c6eb0 
[0x7f40740cb470] check (active)
        Close callback: (nil) 
        Data: 0x7f40740a2950 
        (First field): 0x7f40740c6eb0 
[0x7f40740c6f70] timer (active)
        Close callback: 0x800000000000000 
        Data: 0x7f40740c6eb0 
[0x7f407c0c9ff0] async (active)
[0x7f40740d2c60] async (active)
        Close callback: 0x7f407c09b601 
        Close callback: 0x7f407409b701 
        Data: 0x7f407c0c9f30 
        Data: 0x7f40740d2ba0 
uv loop at [0x7f408bbfdad8] has 4 open handles in total
uv loop at [0x7f408b3fbad8] has 4 open handles in total
uv loop at [0x7f40b9bf9ad8] has open handles:
[0x7f409c0b9a90] prepare (active)
node[78601]: ../src/debug_utils.cc:322:void node::CheckedUvLoopClose(uv_loop_t*): Assertion `0 && "uv_loop_close() while having open handles"' failed.
        Close callback: (nil) 
node[78601]: ../src/debug_utils.cc:322:void node::CheckedUvLoopClose(uv_loop_t*): Assertion `0 && "uv_loop_close() while having open handles"' failed.
        Data: 0x7f409c0c1ff0 
        (First field): 0x7f409c0c5fe0 
[0x7f409c0c0500] check (active)
        Close callback: (nil) 
        Data: 0x7f409c0c1ff0 
        (First field): 0x7f409c0c5fe0 
[0x7f409c0c60a0] timer (active)
        Close callback: 0x800000000000000 
        Data: 0x7f409c0c5fe0 
[0x7f409c0c9fc0] async (active)
        Close callback: 0x7f409c09b501 
        Data: 0x7f409c0c9f00 
uv loop at [0x7f40b9bf9ad8] has 4 open handles in total
uv loop at [0x7f408bffead8] has open handles:
[0x7f40840ba1f0] prepare (active)
        Close callback: (nil) 
node[78601]: ../src/debug_utils.cc:322:void node::CheckedUvLoopClose(uv_loop_t*): Assertion `0 && "uv_loop_close() while having open handles"' failed.
        Data: 0x7f40840c1d00 
        (First field): 0x7f40840cc3e0 
[0x7f40840c0430] check (active)
        Close callback: (nil) 
        Data: 0x7f40840c1d00 
        (First field): 0x7f40840cc3e0 
[0x7f40840cc4a0] timer (active)
        Close callback: 0x800000000000000 
        Data: 0x7f40840cc3e0 
[0x7f40840ce520] async (active)
        Close callback: 0x7f408409b701 
        Data: 0x7f40840ce460 
uv loop at [0x7f408bffead8] has 4 open handles in total
node[78601]: ../src/debug_utils.cc:322:void node::CheckedUvLoopClose(uv_loop_t*): Assertion `0 && "uv_loop_close() while having open handles"' failed.
uv loop at [0x7f40b97f8ad8] has open handles:
[0x7f40a00c0c70] prepare (active)
        Close callback: (nil) 
        Data: 0x7f40a00c1920 
        (First field): 0x7f40a00bf880 
[0x7f40a00c0cf0] check (active)
        Close callback: (nil) 
        Data: 0x7f40a00c1920 
        (First field): 0x7f40a00bf880 
[0x7f40a00bf940] timer (active)
        Close callback: 0x800000000000000 
        Data: 0x7f40a00bf880 
[0x7f40a00ce710] async (active)
        Close callback: 0x7f40a009b401 
        Data: 0x7f40a00ce650 
uv loop at [0x7f40b97f8ad8] has 4 open handles in total
uv loop at [0x7f40b8bf5ad8] has open handles:
[0x7f408c0b9f50] prepare (active)
node[78601]: ../src/debug_utils.cc:322:void node::CheckedUvLoopClose(uv_loop_t*): Assertion `0 && "uv_loop_close() while having open handles"' failed.
uv loop at [0x7f408affaad8] has open handles:
[0x7f40780d6b70] prepare (active)
        Close callback: (nil) 
 1: 0x558615d2b370 node::Abort() [node]
uv loop at [0x7f40b8ff6ad8] has open handles:
[0x7f40980c4b20] prepare (active)
 1: 0x558615d2b370 node::Abort() [node]
uv loop at [0x7f40b87f4ad8] has open handles:
[0x7f40900b9c30] prepare (active)
 1: 0x558615d2b370 node::Abort() [node]
uv loop at [0x7f40b9ffaad8] has open handles:
[0x7f40a80bfe30] prepare (active)
 1: 0x558615d2b370 node::Abort() [node]
uv loop at [0x7f408b7fcad8] has open handles:
[0x7f40800d67c0] prepare (active)
 1: 0x558615d2b370 node::Abort() [node]
        Close callback: (nil) 
 1: 0x558615d2b370 node::Abort() [node]
        Data: 0x7f408c0c3ba0 
 2: 0x558615d2b3f3  [node]
        Close callback: (nil) 
 2: 0x558615d2b3f3  [node]
        Close callback: (nil) 
 3: 0x558615da74bf  [node]
 4: 0x558615caf908 node::worker::Worker::Run() [node]
 2: 0x558615d2b3f3  [node]
        Close callback: (nil) 
 2: 0x558615d2b3f3  [node]
        Data: 0x7f40780d4460 
 2: 0x558615d2b3f3  [node]
 3: 0x558615da74bf  [node]
 4: 0x558615caf908 node::worker::Worker::Run() [node]
 5: 0x558615cb0125  [node]
 6: 0x7f40c29b9fa7  [/lib64/libpthread.so.0]
 7: 0x7f40c3701dbf clone [/lib64/libc.so.6]
        Data: 0x7f4090077860 
        Close callback: (nil)

PS: this also reproduces even if you don't listen anything in workers and just require uws it causes crash when process.exit is called.

const uWS = require('uWebSockets.js'); // comment this and there will be no crash
const { Worker, isMainThread, threadId, parentPort } = require('worker_threads');
const os = require('os');

if (isMainThread) {
    const workers = os.cpus().map(() => {
        return new Worker(__filename);
    });
    const cleanup = function cleanup() {
        workers.forEach((worker) => {
            worker.postMessage('cleanup');
        });
    };
    process.on('SIGTERM', cleanup);
    process.on('SIGINT', cleanup);
    setTimeout(() => {
        process.exit(0);
    }, 5000);
} else {
    parentPort.on('message', (value) => {
        if (value === 'cleanup') {
            process.exit(0);
        }
    });
}

Is it expected or am I missing something?

ghost commented 4 years ago

I can trigger the same issue. I will look into this but WorkerThreads are definitely recommended over Cluster. With WorkerThreads there's a clear idea what Node.js does, while Cluster I have no idea what Node.js does and there could be bigger issues.

ghost commented 4 years ago

I get all kinds of similar issues only starting a basic worker threads example and letting it close down (no server started or anything). This should be fixed and pushed harder as a main feature.

lostrepo commented 4 years ago

@hst-m In your example you should free uWS instance inside shutdown before killing worker (cluster is a bunch of processes afaik).

In SkeLLLa`s example it can help too but as soon as any worker gets request we can't free uWS in worker, that handled it, for some time.

After uWS.us_listen_socket_close(socket) thread still handles requests (imo it shouldn't but I'm too dumb to dig into C++). Sample log of dumb test script and reloading of localhost:3000 (same happens under wrk load for all threads): Thread dumb ID: log msg

703999:  /
703999:  /favicon.ico
703999:  socket close
703999:  /
703999:  /favicon.ico
...
703999:  /
703999:  /favicon.ico

I was able to get (uWebSockets.js 18.13.0 playing around with timeouts between uWS.us_listen_socket_close(socket), uWS.free() and thread termination):

  1. corrupted double-linked list
  2. uv loop at [0x...] has open handles:...
  3. Segmentation fault
  4. Bus error

In some cases it worked fine when delay between last handled request and uWS.free() was long enough (I guess some timeout in native land closes sockets).

Dumb test script Navigate to or bombard localhost:3000 or just wait ~21 seconds while it shuts down. Math.random() > 1 ? 'cleanup' : 'apptest' - fast auto shutdown (works) or thread socket closing on request to localhost:3000 with timeout shutdown (works under condition described in previous paragraph), random was used to mix both types of threads to find workaround.

Hope it helps.

hst-m commented 4 years ago

@lostrepo I don't know what uWS.free does but looks like it gets called automatically on process exit for uWS.js, so don't need to add it for Cluster:

https://github.com/uNetworking/uWebSockets.js/blob/master/src/uws.js#L28

hst-m commented 4 years ago

looks like this issue is caused by worker threads not calling uWS.free() because process.on('exit') event only gets called for processes? adding in a uWS.free() in the worker before process.exit(0) fixes the problem, but @lostrepo is saying there is timing issue on when to call it

lostrepo commented 4 years ago

@hst-m In your case timing doesn't matter since you create uWS instance for each 'forked' process (cluster stuff).

In case of worker_threads I don't understand what the flow is. It happens if we receive connections when we close listen sockets (they still get and handle new connections after closing socket). If we call uWS.free() after some timeout after last connection to listen socket it sometimes works and sometimes doesn't.

ghost commented 4 years ago

In order to support Worker threads, addons need to clean up any resources they may have allocated when such a thread exists. This can be achieved through the usage of the AddEnvironmentCleanupHook() function

It looks like the add-on needs this one instead of using process on exit

ghost commented 4 years ago

I have tested the above, and if you remove uWS.free and process.on('exit') and instead call that stuff inside AddEnvironmentCleanupHook it shuts down properly.

So I will make this change in next release and hopefully it will work better for everyone.

ghost commented 4 years ago

Can everyone who had this issue try the latest binaries:

npm install uNetworking/uWebSockets.js#binaries