redis / ioredis

🚀 A robust, performance-focused, and full-featured Redis client for Node.js.
MIT License
14.44k stars 1.2k forks source link

Blocking not working properly in case of reconnections #1285

Open manast opened 3 years ago

manast commented 3 years ago

There is this older issue https://github.com/luin/ioredis/issues/610 also created by me. But since this is rather important I would like to create a new one with a very specific code to reproduce the issue easily so that it can be resolved once and for all :).

This is the code:


async function main() {
  const client = new Redis();

  client.on('error', err => console.log('Redis error', err));
  client.on('reconnecting', msg => console.log('Redis reconnecting...', msg));
  client.on('close', () => console.log('Redis closed...'));
  client.on('connect', () => console.log('Redis connected...'));

  while (true) {
    try {
      console.log('going to block');
      const value = await client.brpoplpush('a', 'b', 4);
      console.log('unblocked', value);
    } catch (err) {
      console.error('ERROR', err);
    }
  }
}

main();

How to reproduce

Just run the code above with a local redis server. The while loop will run forever outputting the following:

going to block
Redis connected...
unblocked null
going to block
unblocked null
going to block
unblocked null
going to block

It just blocks for up to 4 seconds, then unblocks, and so on. Now, while this program is running just stop the redis server, wait a couple of seconds and start it again:

Output would look like this:

Redis closed...
Redis reconnecting... 50
Redis connected...
Redis closed...
Redis reconnecting... 100
Redis error Error: connect ECONNREFUSED 127.0.0.1:6379
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1141:16) {
  errno: 'ECONNREFUSED',
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 6379
}
...
...
Redis closed...
Redis reconnecting... 700
Redis connected...

And thats all. The while loop will not continue running as if the call to client.brpoplpush has hanged forever.

Expected results

I expect that as soon as the the client disconnects the call to client.brpoplpush rejects the promise with a connection error. Client code should be able to handle calling again to this blocking command.

I am a bit surprised no one else has reported this issue, I wonder if there is some wrong expectation from my side or if I am using the library incorrectly, if so please let me know since this issue is quite severe for users of Bull/BullMQ libraries.

luin commented 3 years ago

Hey @manast, thanks for raising this.

Actually, the code works as expected on my side:

Redis closed...
Redis reconnecting... 100
Redis error Error: connect ECONNREFUSED 127.0.0.1:6379
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1141:16) {
  errno: 'ECONNREFUSED',
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 6379
}
...
...
Redis closed...
Redis reconnecting... 2000
Redis connected...
unblocked null
going to block
unblocked null
...

The expected result is when the server is down, ioredis should try to reconnect (according to retryStrategy), and once connected again, all pending commands should be resent. If the server is down forever (or long enough, according to autoResendUnfulfilledCommands), the brpoplpush should be rejected (according to maxRetriesPerRequest) as you mentioned.

Did I miss anything?

manast commented 3 years ago

ok, this is interesting. After reading your response I tried with a redis server running natively in my mac instead of using a docker container as before, and now I get the same result as you did, how is it possible?

luin commented 3 years ago

Could you provide some instructions for a reproducible setup? Ex the cli command to run the docker container.

manast commented 3 years ago

1) docker run --name redis -p 6379:6379 -d redis 2) run test snippet above 3) docker stop redis 4) wait a couple of seconds 5) docker start redis

luin commented 3 years ago

Hmm...that's strange. I followed the steps but still got the same result as I posted above.

Did you notice the error when the server was down more than ~8s:

CleanShot 2021-02-06 at 13 19 28@2x
manast commented 3 years ago

No I did not get the MaxRetriesPerRequestError either. I tried with redis:latest and ioredis 4.22.0 still it is reproducible every time in my setup, intel MacOs Bigsur 11.1

marcbachmann commented 3 years ago

I also experienced the same issue with bullmq causing memory leaks and high cpu load. And the while loop was the reason in some cases. There might really a bug in ioredis in the reconnecting state, but I haven't had the issue anymore since I checked the states manually (i also migrated to redis streams in the meanwhile):

function waitForReconnect (client) {
  let reconnectListener
  const promise = new Promise((resolve) => {
    readyListener = function () {
      client.removeListener('ready', reconnectListener)
      client.removeListener('end', reconnectListener)
      resolve()
    }
    client.once('ready', reconnectListener)
    client.once('end', reconnectListener)
  })

  promise.resolve = reconnectListener
  return promise
}

And adopt it in the loop:

  // call reconnectListener.resolve() during server/queue shutdown
  let reconnectListener = undefined

  while (true) {
    if (client.status === 'end') return

    try {
      console.log('going to block');
      const value = await client.brpoplpush('a', 'b', 4);
      console.log('unblocked', value);
    } catch (err) {
      // if redis got closed (by a stopping server, we can just return as the queue should be shut down)
      if (client.status === 'end') return
      console.error('ERROR', err);

      switch (client.status) {
        case 'wait':
        case 'connecting':
        case 'reconnecting':
          reconnectListener = waitForReconnect(client)
          await readyListener
          readyListener = undefined
      }
    }
  }

ah, and with that code, I'd configure a low maxRetry config as you don't need to rely on ioredis to handle the errors.

luin commented 3 years ago

Anyone that can reproduce this issue can you enable the debug log (DEBUG=ioredis* node yourapp) and post the logs here? Also, we just updated a new version (v4.24.2) with a connection-related fix, I don't think they're related though but can you try with the latest version to make sure?

royalpinto commented 3 years ago

@luin like mentioned here by @manast, it's happening for me as well on Redis instance inside Docker on Mac.

ioredis@4.27.8
Redis@3.2.8 running on docker@3.3.1 on Mac.

We are facing an issue of jobs getting stuck on a few productions instances too (non docker), I am guessing it could be because of the same but I am not sure.

Any thoughts ?

royalpinto commented 3 years ago

@luin attaching logs with DEBUG=ioredis* , it's simply stuck without any logs. ioredis.log

JJCella commented 2 years ago

I manage to reproduce this issue : it occurs when retryStrategy is returning values less than 50 (exp(0), exp(1), exp(2) in the code snippet below, BullMQ default)

Cannot reproduce with ioredis default retryStrategy Math.min(times * 50, 2000); because returning always numbers >= 50 https://github.com/luin/ioredis/blob/0db2d4f5f27d7106832c934a798e616836d1d0a6/lib/redis/event_handler.ts#L181

ioredis v5.2.3

node -v
v16.16.0

docker run -p 6379:6379 redis

redis-cli
127.0.0.1:6379> INFO
# Server
redis_version:7.0.5
import Redis from 'ioredis';

async function main() {

    // https://github.com/taskforcesh/bullmq/blob/0fb2964151166f2aece0270c54c8cb4f4e2eb898/src/classes/redis-connection.ts#L62
    const retryStrategy = (times: number) => Math.min(Math.exp(times), 20000);
    const client = new Redis({ retryStrategy });

    client.on('error', err => console.log('Redis error', err));
    client.on('reconnecting', msg => console.log('Redis reconnecting...', msg));
    client.on('close', () => console.log('Redis closed...'));
    client.on('connect', () => console.log('Redis connected...'));

    while (true) {
        try {
            console.log('brpoplpush');
            await client.brpoplpush('a', 'b', 4);
            console.log('brpoplpush end');
        } catch (err) {
            console.error('error', err);
        }
    }
}

main().catch();

Output :

brpoplpush
Redis connected...
Redis closed...
Redis reconnecting... 2.718281828459045
Redis connected...
Redis error Error: read ECONNRESET
    at TCP.onStreamRead (node:internal/stream_base_commons:217:20) {
  errno: -104,
  code: 'ECONNRESET',
  syscall: 'read'
}
Redis closed...
Redis reconnecting... 7.38905609893065
Redis error Error: connect ECONNREFUSED 127.0.0.1:6379
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1187:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 6379
}
Redis closed...
Redis reconnecting... 20.085536923187668
Redis error Error: connect ECONNREFUSED
Redis closed...
Redis reconnecting... 54.598150033144236
Redis error Error: connect ECONNREFUSED
Redis closed...
Redis reconnecting... 148.4131591025766
Redis error Error: connect ECONNREFUSED
Redis closed...
Redis reconnecting... 403.4287934927351
Redis error Error: connect ECONNREFUSED
Redis closed...
Redis reconnecting... 1096.6331584284585
Redis connected...