Open mishase opened 1 year ago
Also affects #1889 (Aborted blocking commands leave dangling sockets)
Great job debugging this, I would have never found it..! Even though there is no releases since mid-2019, can you please open an issue for yallist? If yallist is not maintained anymore we probably should switch away from it.. (maybe js-sdsl is a good option?)
Maybe js-sdsl is a good option. I've never used any libraries for LinkedList
@mishase I think that for v5 we'll move to a "custom-made" linked list.. will keep you posted :)
Looks like this issue has been fixed in https://github.com/isaacs/yallist/commit/d240e59f13571b0af1b757711aecadc11d5e759b. So custom implementation may no longer be necessary
I'm seeing this bug cause the client to get into a state where it can never reconnect. I was investigating this bug: https://github.com/vercel/next.js/pull/68221
... reproducer is exactly as described in that bug but with the client in cache-handler.js set up as:
client = createClient({
url: process.env.REDIS_URL ?? "redis://localhost:6379",
disableOfflineQueue: true,
});
// Redis won't work without error handling.
client.on("error", (e) => {
console.error("REDIS ERROR", e);
});
turning off redis at runtime now results in this error:
node-1 | REDIS ERROR SocketClosedUnexpectedlyError: Socket closed unexpectedly
node-1 | at Socket.<anonymous> (/home/node/app/node_modules/@redis/client/dist/lib/client/socket.js:194:118)
node-1 | at Object.onceWrapper (node:events:635:26)
node-1 | at Socket.emit (node:events:520:28)
node-1 | at TCP.<anonymous> (node:net:337:12)
node-1 | TypeError: Cannot read properties of undefined (reading 'reject')
node-1 | at RedisCommandsQueue._RedisCommandsQueue_flushQueue (/home/node/app/node_modules/@redis/client/dist/lib/client/commands-queue.js:176:22)
node-1 | at RedisCommandsQueue.flushAll (/home/node/app/node_modules/@redis/client/dist/lib/client/commands-queue.js:171:77)
node-1 | at RedisSocket.<anonymous> (/home/node/app/node_modules/@redis/client/dist/lib/client/index.js:417:67)
node-1 | at RedisSocket.emit (node:events:520:28)
node-1 | at RedisSocket._RedisSocket_onSocketError (/home/node/app/node_modules/@redis/client/dist/lib/client/socket.js:218:10)
node-1 | at Socket.<anonymous> (/home/node/app/node_modules/@redis/client/dist/lib/client/socket.js:194:107)
node-1 | at Object.onceWrapper (node:events:635:26)
node-1 | at Socket.emit (node:events:520:28)
node-1 | at TCP.<anonymous> (node:net:337:12)
... and after re-enabling redis it doesn't connect again.
As well as the yallist bug, what's going on here is that the error handling in #onSocketError
is fragile, just as in the next.js bug:
#onSocketError(err: Error): void {
const wasReady = this.#isReady;
this.#isReady = false;
this.emit('error', err);
if (!wasReady || !this.#isOpen || typeof this.#shouldReconnect(0, err) !== 'number') return;
this.emit('reconnecting');
this.#connect().catch(() => {
// the error was already emitted, silently ignore it
});
}
if an exception gets thrown while handling this.emit('error', err), then #shouldNotReconnect is not called, and #isOpen is never set to false. Because #shouldReconnect is not called the reconnection to redis never happens.
I think at the very least it would help to reorder this a bit:
#onSocketError(err: Error): void {
const wasReady = this.#isReady;
this.#isReady = false;
const noReconnect = !wasReady || !this.#isOpen || typeof this.#shouldReconnect(0, err) !== 'number';
this.emit('error', err);
if (noReconnect) return;
this.emit('reconnecting');
this.#connect().catch(() => {
// the error was already emitted, silently ignore it
});
}
... that way the reconnection attempt still runs, and the exceptions from inside the error handler propagate (if that's what was intended)
Description
Bug
I'm getting an error (line numbers may be affected because of me modifying the package while trying to investigate the source of an error)
Error happens in
flushQueue
functionBecause of
queue.length
equals-1
. It goes bellow zero due to bug inyallist
package, developers of which seems to be dead in mid-2019, so submitting issue there will never be resolvedDetailed description of
yallist
bugLet's look at the
shift()
functionIt removes item from the list and returns
Node
of itNow let's examine the
removeNode(node)
functionAs you can see, there is a
node.list !== this
check in the beginning of the function andnode.list = null
in the end of this function which prevents it from being called twice, but mentioned earlier functionshift()
does not havelist = null
line. So we can callshift
and thenremoveNode
which will breaknode.list.length
value. In our case, we're callingshift()
while there is one item and thenremoveNode
call resultslist.length
to be-1
Code to reproduce this bug
What happens at the event of AbortSignal triggered in
redis
command when passing it to the command optionsIn mentioned earlier
flushQueue
function it callsqueue.shift
function which removesnode
from thewaitingToBeSent
listIn
addCommand
function there is a callthis.#waitingToBeSent.removeNode
toremoveNode
function which I described earlier. This causeswaitingToBeSent.length
to be-1
Possible fixes
Check if node exists before calling
removeNode
function in Redis libraryFork
yallist
and addnode.list = null
in all necessary functions or implement different presence checks in theremoveNode
functionNode.js Version
v18.14.0
Redis Server Version
all versions affected
Node Redis Version
4.6.6
Platform
all platforms affected
Logs
No response