Closed akramali86 closed 3 years ago
Yeah, I think i know why this happens. There is a loop inside BullMQ that throws an exception in this case and stops looping. We have a fix in older Bull that I can port to BullMQ that should resolve the issue though.
thanks @manast. In the meantime would you see any issue with us manually calling the run
method in an interval to restart the worker? I know it's not very elegant, but it seems to work. Just wondering if it would cause any memory leaks etc.
Example:
const { Worker } = require('bullmq');
const worker = new Worker('worker');
setInterval(() => {
if (!worker.running && !worker.closing) {
console.log('Restarting worker');
worker.run().catch(() => {
worker.running = false;
console.log('Could not restart worker');
});
}
}, 60000);
I do not see any issue with the naked eye, it should work.
We were having the same issue some days ago (the worker just died without any notification) but after intrucing the isRunning() on the worker everything seems to work fine for us when we just restart the workers (however, we do it in kubernetes via health check fail, and restart the worker pod if it dies.)
@sven-codeculture what do you mean with "it died without any notification" ?
:tada: This issue has been resolved in version 1.40.0 :tada:
The release is available on:
Your semantic-release bot :package::rocket:
In production we're using Amazon Elasticache with BullMQ^1.34.2
We're finding that in the event of a failover the following error is emitted by the workers
UNBLOCKED force unblock from blocking operation, instance state changed (master -> replica?)
and workers stop processing jobs. But jobs are still able to be queued.Currently we have to redeploy our app to rectify this issue. Is there anything we can do to handle this error so that when Redis reconnects it can start processing jobs again? Thanks.