taskforcesh / bullmq-pro-support

Support repository for BullMQ Pro edition.
1 stars 0 forks source link

Bull error (stalled jobs) #43

Open hardcodet opened 1 year ago

hardcodet commented 1 year ago

Hi there

Version: 5.1.14

Noticed the following Bull error on our cluster that occurred during application startup.

Error: job stalled more than allowable limit
    at /app/node_modules/bullmq/dist/cjs/classes/worker.js:517:62
    at Array.forEach (<anonymous>)
    at WorkerPro.notifyFailedJobs (/app/node_modules/bullmq/dist/cjs/classes/worker.js:517:20)
    at WorkerPro.moveStalledJobsToWait (/app/node_modules/bullmq/dist/cjs/classes/worker.js:510:22)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async WorkerPro.checkConnectionError (/app/node_modules/bullmq/dist/cjs/classes/queue-base.js:118:20)
    at async WorkerPro.runStalledJobsCheck (/app/node_modules/bullmq/dist/cjs/classes/worker.js:494:17)

Looking at my logs for the last 30 days, I saw a burst of those on one single day, not sure what to make of that:

image

Any idea on how to triangulate this?

manast commented 1 year ago

These are jobs that have stalled more than the max stalled count setting (1 by default https://api.docs.bullmq.io/interfaces/WorkerOptions.html#maxStalledCount). In your case I suspect that you server was restarted several times in a small time span so that some jobs stalled more than once. If this is something expected to happen often you can increase the max stalled count, you should also consider graceful shutdowns to minimize this problem: https://docs.bullmq.io/guide/going-to-production#gracefully-shut-down-workers