Once you call await boss.stop() you can no longer start boss up again with await boss.start().
Well, to be explicit, await boss.start() does succeed without errors but it never resumes picking up jobs.
The goal here is to guarantee no lost jobs when deploying a new job server:
Call stop on pg-boss
pg-boss stops taking new jobs but continues to finish currently active jobs and accept new job requests (publish/schedule)
When we have no more active jobs we deploy our job server updates
pg-boss starts up and begins taking jobs again
So we have a button in our UI to stop pg-boss (it’s more like pause) which then lends itself to having a start button to start it back, otherwise the only way to start it is rebooting the server.
Another reason we're doing this is because even with a SIGTERM/SIGINT signal, most infra (eg k8s) only give you a 30 second grace period before being SIGKILL’d. A lot of jobs can take longer than this, and a SIGKILL will fail an otherwise perfectly fine job.
Once you call
await boss.stop()
you can no longer start boss up again withawait boss.start()
.Well, to be explicit,
await boss.start()
does succeed without errors but it never resumes picking up jobs.The goal here is to guarantee no lost jobs when deploying a new job server:
So we have a button in our UI to stop pg-boss (it’s more like pause) which then lends itself to having a start button to start it back, otherwise the only way to start it is rebooting the server.
Another reason we're doing this is because even with a SIGTERM/SIGINT signal, most infra (eg k8s) only give you a 30 second grace period before being SIGKILL’d. A lot of jobs can take longer than this, and a SIGKILL will fail an otherwise perfectly fine job.