rails / solid_queue

Database-backed Active Job backend
MIT License
1.95k stars 130 forks source link

Rethinking concurrency_limits#duration #336

Closed hms closed 2 months ago

hms commented 2 months ago

Between my reading of the documentation and observationally, concurrency_limits#duration: functions as a soft limit. While this might be correct for some/many use-cases, it can be problematic for others. I have jobs that should never allow for more than one to ever be inflight at a time. While it's easy enough to code for this situation (a combination of locking and skip on lock), I would prefer the option of being explicit via concurrency_limits.

I propose to breaking notion of duration into two parts:

I would propose enhancing concurrency_limits with two optional parameters and behaviors:

If this makes sense and is acceptable, I'm happy to take a run at implementing.

rosa commented 2 months ago

Hmm... I think there's a mismatch of what duration means and the intention of this proposal 🤔 duration here refers to the time we keep jobs blocked before they're candidates for release, but it's not about runtime or the current job running. You could have a job that takes a second to run but blocks other jobs for 10 minutes because it stays enqueued for much longer (for example, if you have a big backlog).

The concurrency limits work as follows: when a job is enqueued, we check if it specifies concurrency controls. If it does, we try to see if we have permission to put it as "ready" (that's the semaphore check). Ready means it can be picked up by workers for execution. If we have permission, we do that, and we don't release the semaphore and try to unblock the next job until it finishes (be it successfully or unsuccessfully). Unblocking the next job doesn't mean running that job right away, but moving it from blocked to ready. Since something can happen that prevents the first job from releasing the semaphore and unblocking the next job, we have the duration as a failsafe. Jobs that have been blocked for more than duration can be released, but only one of them, following the same rules. So, duration is not really about the job that's enqueued or being run, it's about the jobs that are blocked waiting.

hms commented 2 months ago

@rosa Thank you for this explanation.

I completely misunderstood the description in the readme and the purpose of the implementation. For what it's worth, this description was extremely helpful and I think it would add real value as an enhancement to the current documentation.

rosa commented 2 months ago

That's a great point! I'm going to add this explanation to the README. Thank you! 🙏