Open myronmarston opened 11 years ago
This just made my day. I like the idea a lot.I think it generalizes the throttling in a very useful way allowing much finer grained control over concurrency when jobs are using external resources.
I like it as well. +1 to storing the list of jids instead of just a count Also, you mention a resource is released "When a job completes, fails or times out", are retries considered a fail, or will the resource be released while a job is waiting for its retry period to pass?
+1 to storing the list of jids instead of just a count
Yeah, the more I think about it, the more I like it being a set (not a list) of jids. If we used the counter and had a "counter leak", it doesn't provide the data to be able to troubleshoot what jobs are holding the resource. A jid set gives the you the details to be able to inspect all the jobs holding that resource.
Also, you mention a resource is released "When a job completes, fails or times out", are retries considered a fail, or will the resource be released while a job is waiting for its retry period to pass?
I think the job's jid should be in the resource set only while it is in the running
state.
On Aug 14, 2013, at 11:14 AM, Myron Marston wrote:
I think the job's jid should be in the resource set only while it is in the running state.
Agreed
I'm going to create a fork of qless-core and implement this feature. Any feedback on the proposed implementation welcomed.
Introduce a new class called QlessResource using the following keys
ql:r:[id]-jids
A sorted set of job identifiers requiring the specified resource
ql:r:[id]-pending
A sorted set of job identifiers waiting on the specified resource becoming available. The is used when existing jobs are releasing a resource to assign to the next job and move it to the -locks set
ql:r:[id]-locks
A set of job identifiers which have an active lock for the specified resource
ql:r:[id]
A hash table identifying various properties for this resource. The only key is max
, which indicates the maximum usage available
Extend the put
command so that an additional options parameter can be included for specifying an array of resource identifiers. If the resources key is present, call the Qless.resource(id):acquire(jid) API for each resource. The acquire API will either add an entry to the ql:resource:[id]-locks
or ql:resource:[id]-pending
sets depending on the availability. The set will be sorted based on priority to ensure correct FIFO ordering.
Extend the pop
command, so invalidated locks release their resources and scheduled or recurring jobs that require resources appropriately acquire them before being added to the work queue, otherwise they are added to the ql:resource:[id]-pending
set.
Extend the complete command, so that if the completing job has active resource locks, releases them, assigning them to the next pending job and moving it to the ql:q:[name]-work
set if it has successfully acquired all required resources.
Extend the fail
and retry
commands, so they release active resource locks when a job transitions from the running state, and enqueue pending jobs waiting for the specified resource.
The patch will be documentation of the implementation details, but I'll be conscious of keeping things as efficient as possible.
Was there ever a PR for this? I would love to see it added to master, so let me know if there is something I can do to help.
Currently Qless supports throttling at a per-queue level. We have a need to due throttling on an arbitrary named resource (in our case a MySQL host in our shard ring). To prevent our MySQL hosts from getting overloaded, we've set a hard connection limit of 30 connections for our shard building jobs. We rescue and retry "too many connections" errors, but it would be more efficient if we could set a max concurrency per host, w/o having to put jobs in a per-host queue.
So...here's an idea for how we could refactor the current concurrency throttling to be more general:
queue.put(MyJobClass, { data: 15 }, throttlable_resources: ['foo', 'bar'])
.QlessJob#throttlable_resources
qless-core API will take care of adding the queue and klass names to this list when things request the throttlable resources).Pop()
it will increment the counter for each throttlable resource of the popped job.Pop()
it will also check that a potentially popped job's throttlabe resources all have available capacity by looking at the counters. If any of the counters are full, it won't pop that job, moving on in the queue to the next job.scard
can be used to get the count in O(1) time.In our use case, we would use MySQL host names as our throttable resources. This could supercede the existing per-queue throttling (as a queue name would be an implicit throttled resource and this could easily support that use case). It would also nicely support per-job-class throttling.
Thoughts, @dlecocq?
/cc @proby