Open manast opened 3 years ago
Hey @manast, thanks for raising this.
Actually, the code works as expected on my side:
Redis closed...
Redis reconnecting... 100
Redis error Error: connect ECONNREFUSED 127.0.0.1:6379
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1141:16) {
errno: 'ECONNREFUSED',
code: 'ECONNREFUSED',
syscall: 'connect',
address: '127.0.0.1',
port: 6379
}
...
...
Redis closed...
Redis reconnecting... 2000
Redis connected...
unblocked null
going to block
unblocked null
...
The expected result is when the server is down, ioredis should try to reconnect (according to retryStrategy
), and once connected again, all pending commands should be resent. If the server is down forever (or long enough, according to autoResendUnfulfilledCommands
), the brpoplpush
should be rejected (according to maxRetriesPerRequest
) as you mentioned.
Did I miss anything?
ok, this is interesting. After reading your response I tried with a redis server running natively in my mac instead of using a docker container as before, and now I get the same result as you did, how is it possible?
Could you provide some instructions for a reproducible setup? Ex the cli command to run the docker container.
1) docker run --name redis -p 6379:6379 -d redis
2) run test snippet above
3) docker stop redis
4) wait a couple of seconds
5) docker start redis
Hmm...that's strange. I followed the steps but still got the same result as I posted above.
Did you notice the error when the server was down more than ~8s:
No I did not get the MaxRetriesPerRequestError either. I tried with redis:latest and ioredis 4.22.0 still it is reproducible every time in my setup, intel MacOs Bigsur 11.1
I also experienced the same issue with bullmq causing memory leaks and high cpu load. And the while loop was the reason in some cases. There might really a bug in ioredis in the reconnecting state, but I haven't had the issue anymore since I checked the states manually (i also migrated to redis streams in the meanwhile):
function waitForReconnect (client) {
let reconnectListener
const promise = new Promise((resolve) => {
readyListener = function () {
client.removeListener('ready', reconnectListener)
client.removeListener('end', reconnectListener)
resolve()
}
client.once('ready', reconnectListener)
client.once('end', reconnectListener)
})
promise.resolve = reconnectListener
return promise
}
And adopt it in the loop:
// call reconnectListener.resolve() during server/queue shutdown
let reconnectListener = undefined
while (true) {
if (client.status === 'end') return
try {
console.log('going to block');
const value = await client.brpoplpush('a', 'b', 4);
console.log('unblocked', value);
} catch (err) {
// if redis got closed (by a stopping server, we can just return as the queue should be shut down)
if (client.status === 'end') return
console.error('ERROR', err);
switch (client.status) {
case 'wait':
case 'connecting':
case 'reconnecting':
reconnectListener = waitForReconnect(client)
await readyListener
readyListener = undefined
}
}
}
ah, and with that code, I'd configure a low maxRetry config as you don't need to rely on ioredis to handle the errors.
Anyone that can reproduce this issue can you enable the debug log (DEBUG=ioredis* node yourapp
) and post the logs here? Also, we just updated a new version (v4.24.2) with a connection-related fix, I don't think they're related though but can you try with the latest version to make sure?
@luin like mentioned here by @manast, it's happening for me as well on Redis instance inside Docker on Mac.
ioredis@4.27.8
Redis@3.2.8 running on docker@3.3.1 on Mac.
We are facing an issue of jobs getting stuck on a few productions instances too (non docker), I am guessing it could be because of the same but I am not sure.
Any thoughts ?
@luin attaching logs with DEBUG=ioredis*
, it's simply stuck without any logs.
ioredis.log
I manage to reproduce this issue : it occurs when retryStrategy is returning values less than 50 (exp(0), exp(1), exp(2) in the code snippet below, BullMQ default)
Cannot reproduce with ioredis default retryStrategy Math.min(times * 50, 2000);
because returning always numbers >= 50
https://github.com/luin/ioredis/blob/0db2d4f5f27d7106832c934a798e616836d1d0a6/lib/redis/event_handler.ts#L181
ioredis v5.2.3
node -v
v16.16.0
docker run -p 6379:6379 redis
redis-cli
127.0.0.1:6379> INFO
# Server
redis_version:7.0.5
import Redis from 'ioredis';
async function main() {
// https://github.com/taskforcesh/bullmq/blob/0fb2964151166f2aece0270c54c8cb4f4e2eb898/src/classes/redis-connection.ts#L62
const retryStrategy = (times: number) => Math.min(Math.exp(times), 20000);
const client = new Redis({ retryStrategy });
client.on('error', err => console.log('Redis error', err));
client.on('reconnecting', msg => console.log('Redis reconnecting...', msg));
client.on('close', () => console.log('Redis closed...'));
client.on('connect', () => console.log('Redis connected...'));
while (true) {
try {
console.log('brpoplpush');
await client.brpoplpush('a', 'b', 4);
console.log('brpoplpush end');
} catch (err) {
console.error('error', err);
}
}
}
main().catch();
Output :
brpoplpush
Redis connected...
Redis closed...
Redis reconnecting... 2.718281828459045
Redis connected...
Redis error Error: read ECONNRESET
at TCP.onStreamRead (node:internal/stream_base_commons:217:20) {
errno: -104,
code: 'ECONNRESET',
syscall: 'read'
}
Redis closed...
Redis reconnecting... 7.38905609893065
Redis error Error: connect ECONNREFUSED 127.0.0.1:6379
at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1187:16) {
errno: -111,
code: 'ECONNREFUSED',
syscall: 'connect',
address: '127.0.0.1',
port: 6379
}
Redis closed...
Redis reconnecting... 20.085536923187668
Redis error Error: connect ECONNREFUSED
Redis closed...
Redis reconnecting... 54.598150033144236
Redis error Error: connect ECONNREFUSED
Redis closed...
Redis reconnecting... 148.4131591025766
Redis error Error: connect ECONNREFUSED
Redis closed...
Redis reconnecting... 403.4287934927351
Redis error Error: connect ECONNREFUSED
Redis closed...
Redis reconnecting... 1096.6331584284585
Redis connected...
There is this older issue https://github.com/luin/ioredis/issues/610 also created by me. But since this is rather important I would like to create a new one with a very specific code to reproduce the issue easily so that it can be resolved once and for all :).
This is the code:
How to reproduce
Just run the code above with a local redis server. The while loop will run forever outputting the following:
It just blocks for up to 4 seconds, then unblocks, and so on. Now, while this program is running just stop the redis server, wait a couple of seconds and start it again:
Output would look like this:
And thats all. The while loop will not continue running as if the call to
client.brpoplpush
has hanged forever.Expected results
I expect that as soon as the the client disconnects the call to
client.brpoplpush
rejects the promise with a connection error. Client code should be able to handle calling again to this blocking command.I am a bit surprised no one else has reported this issue, I wonder if there is some wrong expectation from my side or if I am using the library incorrectly, if so please let me know since this issue is quite severe for users of Bull/BullMQ libraries.