redis / ioredis

🚀 A robust, performance-focused, and full-featured Redis client for Node.js.
MIT License
14.07k stars 1.19k forks source link

Repeatable job is not picked up from the queue if Redis is reconnected #1804

Closed david-sykora closed 10 months ago

david-sykora commented 10 months ago

In a situation where the Redis connection is reconnected (eg. Redis server failover in HA environment, TCP proxy client connection timeout etc.), the worker does not pick up the repeatable job from the queue. This only happens if the TCP Redis connection closes before the repeatable job's retry period.

Here is the POC (it is necessary that the Redis reconnection occurs after 6 seconds, for example it is possible to hide Redis behind HAProxy and set it to client timeout 6s to achieve this behavior ):

import { Queue, Worker } from "bullmq"
import { Redis } from "ioredis"

const bindRedisEvents = (RedisConnection: Redis) => {
    for (const event of ['connect', 'ready', 'error', 'close', 'reconnecting', 'end']) {
        RedisConnection.on(event, (...args) => {
            if (args[0]) {
                console.log(`[${new Date().toISOString()}] Redis ${event} event`, args)
            } else {
                console.log(`[${new Date().toISOString()}] Redis ${event} event`)
            }
        })
    }
}

(async () => {
    const workerConnection = new Redis({ keepAlive: 2000, maxRetriesPerRequest: null })
    bindRedisEvents(workerConnection)
    const W = new Worker('repeatable-job-test', async (job) => {
        console.log(`[${new Date().toISOString()}] Worker received job: ${job.id}`)
    }, {
        connection: workerConnection
    })
    await W.waitUntilReady()

    const queueConnection = new Redis({ keepAlive: 2000, maxRetriesPerRequest: null })
    bindRedisEvents(queueConnection)
    const Q = new Queue('repeatable-job-test', {
        connection: queueConnection
    })
    await Q.waitUntilReady()
    await Q.add('repeatable-job-test', { test: 'test' }, {
        repeat: {
            every: 7000,
        }
    })
})()

Stdout POC:

[2023-08-23T08:52:21.856Z] Redis connect event
[2023-08-23T08:52:21.911Z] Redis ready event
[2023-08-23T08:52:21.947Z] Redis connect event
[2023-08-23T08:52:22.019Z] Redis ready event
[2023-08-23T08:52:27.228Z] Worker received job: repeat:12b152eee56a73113e04ca79d2ff6559:1692780747000
[2023-08-23T08:52:29.122Z] Redis close event
[2023-08-23T08:52:29.122Z] Redis reconnecting event [ 50 ]
[2023-08-23T08:52:29.174Z] Redis connect event
[2023-08-23T08:52:29.227Z] Redis ready event
[2023-08-23T08:52:34.289Z] Redis close event
[2023-08-23T08:52:34.289Z] Redis reconnecting event [ 50 ]
[2023-08-23T08:52:34.340Z] Redis connect event
[2023-08-23T08:52:34.393Z] Redis ready event
[2023-08-23T08:52:36.229Z] Redis close event
[2023-08-23T08:52:36.229Z] Redis reconnecting event [ 50 ]
[2023-08-23T08:52:36.281Z] Redis connect event
[2023-08-23T08:52:36.335Z] Redis ready event
[2023-08-23T08:52:41.395Z] Redis close event
[2023-08-23T08:52:41.395Z] Redis reconnecting event [ 50 ]
[2023-08-23T08:52:41.447Z] Redis connect event
[2023-08-23T08:52:41.502Z] Redis ready event
[2023-08-23T08:52:43.337Z] Redis close event
[2023-08-23T08:52:43.337Z] Redis reconnecting event [ 50 ]
[2023-08-23T08:52:43.388Z] Redis connect event
[2023-08-23T08:52:43.442Z] Redis ready event
...

First Job from Redis:

127.0.0.1:6379> hgetall bull:repeatable-job-test:repeat:12b152eee56a73113e04ca79d2ff6559
:1692780747000
 1) "data"
    "{\"test\":\"test\"}"
 2) "finishedOn"
    "1692780747228"
 3) "opts"
    "{\"prevMillis\":1692780747000,\"timestamp\":1692780742062,\"jobId\":\"repeat:12b152eee56a73113e04ca79d2ff6559:1692780747000\",\"repeat\":{\"every\":7000,\"count\":1},\"delay\":4938,\"attempts\":0}"
 4) "processedOn"
    "1692780747118"
 5) "rjk"
    "repeatable-job-test::::7000"
 6) "name"
    "repeatable-job-test"
 7) "timestamp"
    "1692780742062"
 8) "delay"
    "0"
 9) "priority"
    "0"
10) "returnvalue"
    "null"
11) "attemptsMade"
    "1"

Second Hanging Job from Redis:

127.0.0.1:6379> hgetall bull:repeatable-job-test:repeat:12b152eee56a73113e04ca79d2ff6559
:1692780754000
1) "opts"
   "{\"repeat\":{\"every\":7000,\"count\":2},\"timestamp\":1692780747175,\"jobId\":\"repeat:12b152eee56a73113e04ca79d2ff6559:1692780754000\",\"prevMillis\":1692780754000,\"delay\":6825,\"attempts\":0}"
2) "timestamp"
   "1692780747175"
3) "rjk"
   "repeatable-job-test::::7000"
4) "priority"
   "0"
5) "delay"
   "6825"
6) "name"
   "repeatable-job-test"
7) "data"
   "{\"test\":\"test\"}"
david-sykora commented 10 months ago

I accidentally opened this issue in the wrong project, moved to bullmq