redis / node-redis

Redis Node.js client
https://redis.js.org/
MIT License
16.86k stars 1.87k forks source link

@redis/client Isolated blocking command with AbortSignal bug (yallist) #2498

Open mishase opened 1 year ago

mishase commented 1 year ago

Description

Bug

I'm getting an error (line numbers may be affected because of me modifying the package while trying to investigate the source of an error)

TypeError: Cannot read properties of undefined (reading 'reject')
    at Function._RedisCommandsQueue_flushQueue (/Users/mishase/Projects/wp/node_modules/@redis/client/dist/lib/client/commands-queue.js:204:11)
    at RedisCommandsQueue.flushAll (/Users/mishase/Projects/wp/node_modules/@redis/client/dist/lib/client/commands-queue.js:195:93)
    at RedisClient.disconnect (/Users/mishase/Projects/wp/node_modules/@redis/client/dist/lib/client/index.js:336:63)
    at <anonymous> (/Users/mishase/Projects/wp/packages/redis/dist/lib/events/EventsClient.js:86:24)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at EventsClient.blockingReadMany (/Users/mishase/Projects/wp/packages/redis/dist/lib/events/EventsClient.js:55:22)
    at Object.handler (/Users/mishase/Projects/wp/apps/screen-api/src/routes/screenshot/status.ts:23:15)

Error happens in flushQueue function

// RedisCommandsQueue.#flushQueue function
while (queue.length) {
    queue.shift()!.reject(err); // This line
}

Because of queue.length equals -1. It goes bellow zero due to bug in yallist package, developers of which seems to be dead in mid-2019, so submitting issue there will never be resolved

Detailed description of yallist bug

Let's look at the shift() function

Yallist.prototype.shift = function () {
    if (!this.head) {
        return undefined;
    }

    var res = this.head.value;
    this.head = this.head.next;
    if (this.head) {
        this.head.prev = null;
    } else {
        this.tail = null;
    }
    this.length--;
    return res;
};

It removes item from the list and returns Node of it

Now let's examine the removeNode(node) function

Yallist.prototype.removeNode = function (node) {
    if (node.list !== this) {
        throw new Error("removing node which does not belong to this list");
    }

    var next = node.next;
    var prev = node.prev;

    if (next) {
        next.prev = prev;
    }

    if (prev) {
        prev.next = next;
    }

    if (node === this.head) {
        this.head = next;
    }
    if (node === this.tail) {
        this.tail = prev;
    }

    node.list.length--;
    node.next = null;
    node.prev = null;
    node.list = null;

    return next;
};

As you can see, there is a node.list !== this check in the beginning of the function and node.list = null in the end of this function which prevents it from being called twice, but mentioned earlier function shift() does not have list = null line. So we can call shift and then removeNode which will break node.list.length value. In our case, we're calling shift() while there is one item and then removeNode call results list.length to be -1

Code to reproduce this bug

import LinkedList from "yallist";

const ll = new LinkedList();

ll.push({ name: "a" });

const node = ll.head;

ll.shift();
console.log(ll.length); // 0

ll.removeNode(node);
console.log(ll.length); // -1

What happens at the event of AbortSignal triggered in redis command when passing it to the command options

In mentioned earlier flushQueue function it calls queue.shift function which removes node from the waitingToBeSent list

In addCommand function there is a call this.#waitingToBeSent.removeNode to removeNode function which I described earlier. This causes waitingToBeSent.length to be -1

return new Promise((resolve, reject) => {
    const node = new LinkedList.Node<CommandWaitingToBeSent>({
        args,
        chainId: options?.chainId,
        returnBuffers: options?.returnBuffers,
        resolve,
        reject,
    });

    if (options?.signal) {
        const listener = () => {
            // Remove already removed from list node and cause length decrement
            this.#waitingToBeSent.removeNode(node);
            node.value.reject(new AbortError());
        };
        node.value.abort = {
            signal: options.signal,
            listener,
        };
        // AbortSignal type is incorrent
        (options.signal as any).addEventListener("abort", listener, {
            once: true,
        });
    }

    if (options?.asap) {
        this.#waitingToBeSent.unshiftNode(node);
    } else {
        this.#waitingToBeSent.pushNode(node);
    }
});

Possible fixes

Check if node exists before calling removeNode function in Redis library

Fork yallist and add node.list = null in all necessary functions or implement different presence checks in the removeNode function

Node.js Version

v18.14.0

Redis Server Version

all versions affected

Node Redis Version

4.6.6

Platform

all platforms affected

Logs

No response

mishase commented 1 year ago

Also affects #1889 (Aborted blocking commands leave dangling sockets)

leibale commented 1 year ago

Great job debugging this, I would have never found it..! Even though there is no releases since mid-2019, can you please open an issue for yallist? If yallist is not maintained anymore we probably should switch away from it.. (maybe js-sdsl is a good option?)

mishase commented 1 year ago

Maybe js-sdsl is a good option. I've never used any libraries for LinkedList

leibale commented 1 year ago

@mishase I think that for v5 we'll move to a "custom-made" linked list.. will keep you posted :)

mishase commented 4 months ago

Looks like this issue has been fixed in https://github.com/isaacs/yallist/commit/d240e59f13571b0af1b757711aecadc11d5e759b. So custom implementation may no longer be necessary

bewinsnw commented 1 month ago

I'm seeing this bug cause the client to get into a state where it can never reconnect. I was investigating this bug: https://github.com/vercel/next.js/pull/68221

... reproducer is exactly as described in that bug but with the client in cache-handler.js set up as:

      client = createClient({
        url: process.env.REDIS_URL ?? "redis://localhost:6379",
        disableOfflineQueue: true,
      });

      // Redis won't work without error handling.
      client.on("error", (e) => {
        console.error("REDIS ERROR", e);
      });

turning off redis at runtime now results in this error:

node-1         | REDIS ERROR SocketClosedUnexpectedlyError: Socket closed unexpectedly
node-1         |     at Socket.<anonymous> (/home/node/app/node_modules/@redis/client/dist/lib/client/socket.js:194:118)
node-1         |     at Object.onceWrapper (node:events:635:26)
node-1         |     at Socket.emit (node:events:520:28)
node-1         |     at TCP.<anonymous> (node:net:337:12)
node-1         | TypeError: Cannot read properties of undefined (reading 'reject')
node-1         |     at RedisCommandsQueue._RedisCommandsQueue_flushQueue (/home/node/app/node_modules/@redis/client/dist/lib/client/commands-queue.js:176:22)
node-1         |     at RedisCommandsQueue.flushAll (/home/node/app/node_modules/@redis/client/dist/lib/client/commands-queue.js:171:77)
node-1         |     at RedisSocket.<anonymous> (/home/node/app/node_modules/@redis/client/dist/lib/client/index.js:417:67)
node-1         |     at RedisSocket.emit (node:events:520:28)
node-1         |     at RedisSocket._RedisSocket_onSocketError (/home/node/app/node_modules/@redis/client/dist/lib/client/socket.js:218:10)
node-1         |     at Socket.<anonymous> (/home/node/app/node_modules/@redis/client/dist/lib/client/socket.js:194:107)
node-1         |     at Object.onceWrapper (node:events:635:26)
node-1         |     at Socket.emit (node:events:520:28)
node-1         |     at TCP.<anonymous> (node:net:337:12)

... and after re-enabling redis it doesn't connect again.

As well as the yallist bug, what's going on here is that the error handling in #onSocketError is fragile, just as in the next.js bug:

    #onSocketError(err: Error): void {
        const wasReady = this.#isReady;
        this.#isReady = false;
        this.emit('error', err);

        if (!wasReady || !this.#isOpen || typeof this.#shouldReconnect(0, err) !== 'number') return;

        this.emit('reconnecting');
        this.#connect().catch(() => {
            // the error was already emitted, silently ignore it
        });
    }

if an exception gets thrown while handling this.emit('error', err), then #shouldNotReconnect is not called, and #isOpen is never set to false. Because #shouldReconnect is not called the reconnection to redis never happens.

I think at the very least it would help to reorder this a bit:

    #onSocketError(err: Error): void {
        const wasReady = this.#isReady;
        this.#isReady = false;
        const noReconnect = !wasReady || !this.#isOpen || typeof this.#shouldReconnect(0, err) !== 'number';

        this.emit('error', err);

        if (noReconnect) return;

        this.emit('reconnecting');
        this.#connect().catch(() => {
            // the error was already emitted, silently ignore it
        });
    }

... that way the reconnection attempt still runs, and the exceptions from inside the error handler propagate (if that's what was intended)