slackapi / node-slack-sdk

Slack Developer Kit for Node.js
https://slack.dev/node-slack-sdk
MIT License
3.26k stars 656 forks source link

socket-mode 1.x: explicit server disconnect during 'connecting' state intermittently not handled #1787

Open filmaj opened 2 months ago

filmaj commented 2 months ago
          @filmaj I seem to be hitting the following now on 1.3.5:
[INFO]  socket-mode:SocketModeClient:0 Going to establish a new connection to Slack ...
/home/patrick/slack/node_modules/finity/lib/core/StateMachine.js:76
      throw new Error('Unhandled event \'' + event + '\' in state \'' + this.currentState + '\'.');
            ^

Error: Unhandled event 'server explicit disconnect' in state 'connecting'.
    at StateMachine.handleUnhandledEvent (/home/patrick/slack/node_modules/finity/lib/core/StateMachine.js:76:13)
    at /home/patrick/slack/node_modules/finity/lib/core/HierarchicalStateMachine.js:79:33
    at TaskScheduler.execute (/home/patrick/slack/node_modules/finity/lib/core/TaskScheduler.js:29:7)
    at TaskScheduler.enqueue (/home/patrick/slack/node_modules/finity/lib/core/TaskScheduler.js:19:12)
    at HierarchicalStateMachine.handle (/home/patrick/slack/node_modules/finity/lib/core/HierarchicalStateMachine.js:72:24)
    at SocketModeClient.onWebSocketMessage (/home/patrick/slack/node_modules/@slack/socket-mode/dist/SocketModeClient.js:608:31)
    at WebSocket.onMessage (/home/patrick/slack/node_modules/ws/lib/event-target.js:132:16)
    at WebSocket.emit (node:events:518:28)
    at Receiver.receiverOnMessage (/home/patrick/slack/node_modules/ws/lib/websocket.js:1068:20)
    at Receiver.emit (node:events:518:28)

With debug logging it shows:

[DEBUG]  socket-mode:SocketModeClient:0 Received a message on the WebSocket: {"type":"disconnect","reason":"too_many_websockets","debug_info":{"host":"applink-2"}}
[DEBUG]  socket-mode:SocketModeClient:0 Received "disconnect" (reason: too_many_websockets) message - will attempt reconnect

Originally posted by @MadrMan in https://github.com/slackapi/node-slack-sdk/issues/1654#issuecomment-2094100237


Any PR to address this should target the socket-mode-1.x branch as this would be a 'maintenance' fix for the now-outdated 1.x line of socket-mode.

filmaj commented 1 month ago

This is a tricky issue to handle, because the 1.x line of socket-mode's architecture is such that there is a single state machine to handle transitions, but potentially, during a reconnect, two WebSocket connections are briefly active/overlapping as a connection is re-established. The underlying problem is, during this reconnect when we have two WS connections open, each WebSocket connection pumps events into the single state machine, causing unexpected state transitions.

Consider the case where, at more or less the same time, one WS connection sends a "disconnect" event to the state machine and another sends a "open" or "connected" event. The single state machine will get confused.

This issue is resolved in socket-mode 2.x, as the architecture has completely changed. Socket-mode 2.0 is available in bolt 4.0.0 rc1, and has been used successfully by some customers who have adopted it early.