timgit / pg-boss

Queueing jobs in Postgres from Node.js like a boss
MIT License
2.02k stars 157 forks source link

Restarting does not resume taking jobs #268

Closed ashconnell closed 3 years ago

ashconnell commented 3 years ago

Once you call await boss.stop() you can no longer start boss up again with await boss.start().

Well, to be explicit, await boss.start() does succeed without errors but it never resumes picking up jobs.

The goal here is to guarantee no lost jobs when deploying a new job server:

  1. Call stop on pg-boss
  2. pg-boss stops taking new jobs but continues to finish currently active jobs and accept new job requests (publish/schedule)
  3. When we have no more active jobs we deploy our job server updates
  4. pg-boss starts up and begins taking jobs again

So we have a button in our UI to stop pg-boss (it’s more like pause) which then lends itself to having a start button to start it back, otherwise the only way to start it is rebooting the server.

Another reason we're doing this is because even with a SIGTERM/SIGINT signal, most infra (eg k8s) only give you a 30 second grace period before being SIGKILL’d. A lot of jobs can take longer than this, and a SIGKILL will fail an otherwise perfectly fine job.

ashconnell commented 3 years ago

Closing this as it's an edge case. Not many people need to pause and then resume their pg-boss instances so I'll work around it.