Closed stephenh closed 3 years ago
Please use a separate table to track this; keeping jobs around unnecessarily compromises worker performance.
Ah yeah, that's fair. I think pedantically, from the overall system perspective, it's just moving where the performance compromise would be, i.e. instead of contention on the jobs table itself, I'm going to move the same de-duping+N-hours-of-history logic to some other table, that exists solely to be a mini-re-implementation of graphile-worker's job-key support.
But I will grant that, for use cases that allow/facilitate it, ideally tracking idempotency is done on the existing business/domain tables directly. It just seems like that is not always the case, and I've seen other job queues support this, i.e. pg-boss iirc does maybe as an optional feature or maybe even by default? Not sure.
But yeah, that's fine; we don't need this immediately; and if we ever do, maybe we'll try doing it as a small mini-fork b/c I don't think the performance concerns would be a problem for our volume/use cases.
Thanks!
Yeah; sorry I didn't mean it'd compromise performance for you (though I would guess that you'd probably have better performance with this split solution anyway due to your deduping table being much smaller and much less in demand than the worker table), more that I don't want to do anything in worker that encourages the jobs table to not be empty because a lot of people think they need this but it's normally better solved a different way. I also don't want to add complexity to the worker where it has to check to see whether it should keep jobs around or not, and I don't want to add a system that checks how long the completed jobs have been around for and then cleans them up. Postgres is not the ideal location for a job queue, so we have to be pretty strict around what we do/don't add to make sure we're squeezing as much performance out of it as we can - every branch matters!
You are of course welcome to fork worker to achieve your use case, but I'd personally just make a separate table and an alternative function (create_job
or similar) that calls add_job
under the hood but with your additional requirements - this'll reduce your maintenance burden over time.
make a separate table and an alternative function (create_job or similar) that calls add_job under the hood but with your additional requirements
Ah yeah, that makes sense. Thanks for the suggestion!
Feature description
Add an option to keep a completed job for a certain time period, i.e. ~24 hours.
Motivating example
We'd like to use job keys to de-dup incoming web hooks from external vendors, i.e.:
Breaking changes
Should be none b/c it can opt-in.
Supporting development
I: