procrastinate-org / procrastinate

PostgreSQL-based Task Queue for Python
https://procrastinate.readthedocs.io/
MIT License
834 stars 52 forks source link

queueing_lock allows multiple jobs to run concurrently in some scenarios #1050

Closed onlyann closed 3 months ago

onlyann commented 3 months ago

Let me start by thanking the authors and contributors for this excellent library.

The description of queueing_lock is :

No two jobs with the same queueing lock can be waiting in the queue.

I noticed that when multiple jobs sharing the same queueing_lock are deferred and the worker is fast enough to pick one (or more) jobs up, it can end up in multiple jobs with the same queueing_lock running concurrently.

Another way to reproduce it is by deferring a job once, and then deferring a second job while the first job is running.

It seems caused by the unique constraint that only considers jobs in the todo status.

Technically, one could argue that when a job is in doing status, then it is no longer waiting in the queue.

That said, it appears that the intent of queueing_lock is to ensure a single job runs at any time for a given queueing_lock. In that case, the unique constraint should be modified to be:

CREATE UNIQUE INDEX procrastinate_jobs_queueing_lock_idx ON procrastinate_jobs (queueing_lock) WHERE status = 'todo' or status = 'doing';

Maybe I am missing something though and there are valid reasons to ignore doing jobs. What do you think?

EDIT: I realise the behaviour I am after can be mostly achieved by combining lock and queueing_lock. It is still not ideal because it would allow the task to be run twice in a row.

ewjoachim commented 3 months ago

Yup, I think your analysis is correct in that queueing_lock was imagined at first as a way to keep periodic jobs from accumulating.

I noticed that when multiple jobs sharing the same queueing_lock are deferred and the worker is fast enough to pick one (or more) jobs up, it can end up in multiple jobs with the same queueing_lock running concurrently.

I think you don't even need to be quick. If you defer job A then it starts, then defer job B, you'll see it start too. The thing the queueing lock prevents really is just deferring jobs when there are other jobs waiting in the queue.

Also, you're right, if you want to avoid the job to run twice, you'd have to combine the queueing lock with a normal lock. I'm guessing you're saying this is unexpected, maybe something we need to improve in the docs ?

It is still not ideal because it would allow the task to be run twice in a row.

I'm trying to understand your usecase. Let's say you have a long periodic task launched every hour. If it's going for less than an hour, no problem. If it's regularly running for more than an hour (and you don't want 2 tasks to run at the same time), you have a general problem in that you're scheduling more work than can be done, so you'll always end up having to skip some tasks to get back on schedule. Skipping tasks takes the form of the queueing lock: it's not really useful to stack multiple jobs in the queues, but having one in the queue means that as soon as the current one ends, you can start working again.

If we were to have a lock that would keep you from deferring a job if one was already enqueued OR executing, then you'd likely have an up to 1h gap where you do nothing between tasks that are more than 1h long, so if your tasks take, say 1h15, and marked as "should run every hour), you'd likely run them every, say 1h45 (whereas we could run them every 1h15, much closer to the expected schedule).

Again I may be missing your usecase which could be completely different from what I have in mind.

onlyann commented 3 months ago

I do understand the intended use case of queueing_lock better now thanks.

The scenario I was considering is some sort of debouncing to avoid running a task too often when it is requested in bursts.

That said, even if queuing_lock was doing what I thought it was doing, it still wouldn't really be debouncing since that would involve delaying the task every time it is requested during the debouncing duration.

onlyann commented 3 months ago

Maybe all I need for this to work is to alter the producer to schedule the job in the near future. Then, any producer could delay the job further if it is already scheduled.

ewjoachim commented 3 months ago

Maybe all I need for this to work is to alter the producer to schedule the job in the near future.

Excellent idea. Or even: use a queuing lock + scheduling. If you want a 1 minute debounce, defer the task to 1 minute in the future. While this job is awaiting its schedule, it will prevent other similar tasks from being deferred (you may want to double check that, but I believe that the PG constraint that keeps this true is not using schedule).

onlyann commented 3 months ago

Thanks for the suggestion @ewjoachim. Feel free to close that issue. If anything, it sounds like something that could be implemented on the client and not be part of this library.

ewjoachim commented 3 months ago

Great! If you think we should update the doc to make all of this clearer, then this can be the ticket to do it (would you like to do it ?)