timgit / pg-boss

Queueing jobs in Postgres from Node.js like a boss
MIT License
2.13k stars 158 forks source link

Shutting down gracefully #237

Closed gavacho closed 3 years ago

gavacho commented 3 years ago

When my app receives a signal to shutdown, I want to finish processing any pg-boss jobs so that they don't expire.

I had solved this by wrapping all of my handlers in some logic so that I could monitor when all the handlers had completed during a shutdown.

Today I started setting teamSize to a value greater than 1. It appears that when my process shuts down, all the jobs that were selected as part of the team get expired.

I would like to be able to tell pg-boss "hey, we're shutting down. take all the time you need to process the jobs you're working on but please don't select any new jobs. can you let me know when you're done?"

timgit commented 3 years ago

Currently, each worker will continue to wait on promises after stop(), but if the connection pool is created by pg-boss, stop() will close it and then job completion will fail. This is obviously less than ideal. :disappointed:

Here are a couple thoughts I have, but I'd like to know what you think (and anyone else interested in chiming in)

  1. stop() should no longer close the created connection pool immediately when called. The semantics of "stop" sounds like "stop fetching work for subscriptions, stop maintenance intervals, and stop cron job evaluation / job publishing". This would mean all stateless, one-off requests would still succeed, such as publish(), fetch(), complete(), cancel(), fail() until all workers are finished with their jobs.
  2. Workers should track more state about in-flight jobs and emit events that a monitor could respond to in order to close the connection pool. This should be externally emitted as well, which would allow us the ability to see progress if desired.
  3. A stopped event could be emitted once all workers are completed with their jobs and the connection pool is closed. I think this is what you are most interested in for this issue.
  4. stop() with a { force: true } argument could bypass all of this and continue to nuke the pool scorched earth style. :fire:
gavacho commented 3 years ago

That all sounds good. FWIW, we handle our own db pool lifecycle and pass an executeSql function to pg-boss.

Regarding stop, I think disabling job publishing at that time would mean that any job handler would fail if it publishes other jobs.

A stopped event like you describe (that was aware of teams and batches) would solve our issue! 👍