Open dzmm opened 2 months ago
I do not think that by design it will work like you are describing it.
Most likely what is happening is that since the worker just finished processing this job, and the delay is zero, it gets to pick it up. If the worker was malfunctioning it would not pick the same job again. But there is a chance that some other worker that also is idling picks it up.
In any case it would be impossible to guarantee that the same worker that failed the job would not pick it again, so there really is not a lot we can do here.
For this to work, a worker must keep some kind of list of jobIds of recent failed jobs so that it will ignore them and thus give a chance for other workers to pick them up. It is not completely trivial to implement though, and this list of jobs must be passed to the moveToActive Lua script in every call, or be stores in some specific Redis key...
Currently, when using linear backoff with a delay of 0, failed jobs are retried on the same worker. However, in some scenarios, a worker might be on a malfunctioning machine, and we need the ability to retry the job on a different worker.
Current Behavior
With linear backoff and zero delay, failed jobs are always retried on the same worker that initially failed to process them.
Desired Behavior
Even with linear backoff and zero delay, failed jobs should have the option to be retried on different workers, allowing for better fault tolerance and recovery from worker-specific issues.
is there anyway I can do this on current version of bullmq?