timgit / pg-boss

Queueing jobs in Postgres from Node.js like a boss
MIT License
1.95k stars 153 forks source link

Feature request: singleton collision results in job data update #309

Closed tcoats closed 1 year ago

tcoats commented 2 years ago

We'd like to throttle our jobs to collapse multiple updates into a single change - the last one. The singleton logic currently drops new jobs. Could we introduce a boolean that changes singleton collisions to update the job's data instead? This would need to work with singletonNextSlot correctly - e.g. not update the first collision in that situation, but update the second one.

This coupled with status based singleton logic would require a decent change to the createJob functionality. For our use case it's well worth it and we're happy to do the work. If this is of interest it may be good to evaluate the singleton logic in it's entirety and consider other throttling and debounce scenarios. This sort of logic is something we're used to client side. For example one_debounce is good for throttling search results.

I believe the logic still works well in the send method.

Thoughts?

timgit commented 2 years ago

Yes, I will provide my thoughts. ;) I'm a bit concerned about this request because it blurs the lines between what a queue is and what a database is, so relying on this would tightly couple your architecture and limit your future options only to tools that would support this (like pg-boss). I guess that is attributed to the fact that we're building a queue in a database, which is a fair counter to that argument. And I mention this also because I want to provide a "normal" queue feature matrix that makes it very easy to migrate away from if and when needed.

I think we will all have to agree that at some point, it becomes too much effort to achieve a very large scale postgres queue. In my experience, for example, I try to keep my queue tables under 5 million jobs or I find myself having to babysit things like autovacuum. And now that I've mentioned that use case, keep in mind that adding more updates to the equation will result in more autovacuum operation.

Are you using a different tracking table currently to achieve this upset use case?

tcoats commented 2 years ago

We're rearchitecting an existing solution that doesn't de-dupe, so no existing solution.

I think it's important that pg-boss continues to deliver on the vision and I wouldn't want to change that. We like pg-boss because it's based on a database and doesn't need to scale for our use cases. It's the transaction and integrity that works well.

timgit commented 2 years ago

Are you imagining a solution that is also compatible with the bulk loading issue #310? As mentioned in that issue, debouncing is probably the biggest outlier. This is because it relies on a unique constraint violation on try 1 and will execute another round trips to the db.

If we decide to accept that caveat (excluding rewriting how debouncing works), it sounds like a candidate implementation would be something like the following.

  1. Add a config option to opt into upsert (this will replace DO NOTHING with DO UPDATE)
  2. Add a condition to the update expression in step 1 to state = 'created'