Option to keep completed jobs for awhile

stephenh commented 3 years ago

Feature description

Add an option to keep a completed job for a certain time period, i.e. ~24 hours.

Motivating example

We'd like to use job keys to de-dup incoming web hooks from external vendors, i.e.:

We get a webhook with message-id=1234
We run a worker job to do the work with jobkey=process-message-1234
The job completes and is deleted
The vendor re-fires our webhook with message-id=1234 again

Breaking changes

Should be none b/c it can opt-in.

Supporting development

I:

[ ] am interested in building this feature myself
[ ] am interested in collaborating on building this feature
[x] am willing to help testing this feature before it's released
[x] am willing to write a test-driven test suite for this feature (before it exists)
[ ] am a Graphile sponsor ❤️
[ ] have an active support or consultancy contract with Graphile

benjie commented 3 years ago

Please use a separate table to track this; keeping jobs around unnecessarily compromises worker performance.

stephenh commented 3 years ago

Ah yeah, that's fair. I think pedantically, from the overall system perspective, it's just moving where the performance compromise would be, i.e. instead of contention on the jobs table itself, I'm going to move the same de-duping+N-hours-of-history logic to some other table, that exists solely to be a mini-re-implementation of graphile-worker's job-key support.

But I will grant that, for use cases that allow/facilitate it, ideally tracking idempotency is done on the existing business/domain tables directly. It just seems like that is not always the case, and I've seen other job queues support this, i.e. pg-boss iirc does maybe as an optional feature or maybe even by default? Not sure.

But yeah, that's fine; we don't need this immediately; and if we ever do, maybe we'll try doing it as a small mini-fork b/c I don't think the performance concerns would be a problem for our volume/use cases.

Thanks!

benjie commented 3 years ago

Yeah; sorry I didn't mean it'd compromise performance for you (though I would guess that you'd probably have better performance with this split solution anyway due to your deduping table being much smaller and much less in demand than the worker table), more that I don't want to do anything in worker that encourages the jobs table to not be empty because a lot of people think they need this but it's normally better solved a different way. I also don't want to add complexity to the worker where it has to check to see whether it should keep jobs around or not, and I don't want to add a system that checks how long the completed jobs have been around for and then cleans them up. Postgres is not the ideal location for a job queue, so we have to be pretty strict around what we do/don't add to make sure we're squeezing as much performance out of it as we can - every branch matters!

You are of course welcome to fork worker to achieve your use case, but I'd personally just make a separate table and an alternative function (create_job or similar) that calls add_job under the hood but with your additional requirements - this'll reduce your maintenance burden over time.

stephenh commented 3 years ago

make a separate table and an alternative function (create_job or similar) that calls add_job under the hood but with your additional requirements

Ah yeah, that makes sense. Thanks for the suggestion!

graphile / worker