socketio / socket.io-cluster-adapter

The Socket.IO official cluster adapter, allowing to broadcast events between several Socket.IO servers.
https://socket.io
MIT License
16 stars 9 forks source link

ERR_IPC_CHANNEL_CLOSED and EPIPE errors on exit event #5

Closed RolandoAndrade closed 1 year ago

RolandoAndrade commented 1 year ago

Hey! Nice work! I found an issue when I try to run a process in cluster mode using this library.

Description

When I run a process in cluster mode using the @socket.io/pm2 library and I try to delete them using pm2 delete all I get the following errors:

PM2             | Error [ERR_IPC_CHANNEL_CLOSED]: Channel closed
PM2             |     at ChildProcess.target.send (internal/child_process.js:680:16)
PM2             |     at Worker.send (internal/cluster/worker.js:45:28)
PM2             |     at EventEmitter.<anonymous> (/usr/lib/node_modules/@socket.io/pm2/node_modules/@socket.io/cluster-adapter/dist/index.js:392:32)
PM2             |     at EventEmitter.emit (events.js:314:20)
PM2             |     at EventEmitter.emit (domain.js:483:12)
PM2             |     at ChildProcess.<anonymous> (internal/cluster/master.js:191:13)
PM2             |     at Object.onceWrapper (events.js:421:26)
PM2             |     at ChildProcess.emit (events.js:314:20)
PM2             |     at ChildProcess.EventEmitter.emit (domain.js:506:15)
PM2             |     at Process.ChildProcess._handle.onexit (internal/child_process.js:276:12) {
PM2             |   code: 'ERR_IPC_CHANNEL_CLOSED'
PM2             | }
PM2             | Error: write EPIPE
PM2             |     at ChildProcess.target._send (internal/child_process.js:807:20)
PM2             |     at ChildProcess.target.send (internal/child_process.js:678:19)
PM2             |     at Worker.send (internal/cluster/worker.js:45:28)
PM2             |     at EventEmitter.<anonymous> (/usr/lib/node_modules/@socket.io/pm2/node_modules/@socket.io/cluster-adapter/dist/index.js:392:32)
PM2             |     at EventEmitter.emit (events.js:314:20)
PM2             |     at EventEmitter.emit (domain.js:483:12)
PM2             |     at ChildProcess.<anonymous> (internal/cluster/master.js:191:13)
PM2             |     at Object.onceWrapper (events.js:421:26)
PM2             |     at ChildProcess.emit (events.js:314:20)
PM2             |     at ChildProcess.EventEmitter.emit (domain.js:506:15) {
PM2             |   errno: 'EPIPE',
PM2             |   code: 'EPIPE',
PM2             |   syscall: 'write'
PM2             | }

It seems on exit it is trying to send the WORKER_EXIT message to disconnected workers.

https://github.com/socketio/socket.io-cluster-adapter/blob/43f9ee8d23d2a4bc72ce4399c0b2b8445360f8cc/lib/index.ts#L579-L590

if we set up pm2 to resurrect on failure, this causes the unexpected resurrection of the killed/deleted/stopped processes.

Based on this thread I think it is related to a synchronization error when multiple processes are closed at once. I was also able to delete the processes one by one without any problem.

I could mitigate the error adding a callback to handle the exception:

    cluster.on("exit", (worker) => {
        // notify all active workers
        for (const workerId in cluster.workers) {
            if (hasOwnProperty.call(cluster.workers, workerId)) {
                cluster.workers[workerId].send({
                    source: MESSAGE_SOURCE,
                    type: EventType.WORKER_EXIT,
                    data: worker.id,
                }, (err) => {
                    if (err) {
                        if (err.code == 'ERR_IPC_CHANNEL_CLOSED' || err.code == 'EPIPE') {
                            console.warn(`There was a synchronization problem. Wrong attempt to send a message to a disconnected worker`)
                            console.log(err);
                        } else {
                            throw err;
                        }
                    }
                });
            }
        }
    });

Steps to reproduce

  1. Create a sample process.
    
    process.on('SIGINT', () => {
    console.log(`Received SIGINT.  Shutting down.`);
    process.exit(0);
    });

let i = 0;

async function run() { while (true) { await new Promise(resolve => setTimeout(resolve, 5000)); console.log(Number ${i++}.); } }

run().catch(error => { console.error('Error!', error); process.exit(); });

2. Delete `pm2` and install `@socket.io/pm2` 
3. Run the process in cluster mode 
```bash
pm2 start test-process.js -i 3
  1. Try to delete/stop/kill the processes.
    pm2 delete all

    After doing this, you will see the exceptions.

Env details

darrachequesne commented 1 year ago

This should be fixed by https://github.com/socketio/socket.io-cluster-adapter/commit/be0a0e3217bd7100d569e5624194612bcc8b96ff, included in version 0.2.1.

Thanks for the detailed report!