seomoz / qless-core

Core Lua Scripts for qless
MIT License
85 stars 33 forks source link

Support throttling on arbitrary resource names #27

Open myronmarston opened 11 years ago

myronmarston commented 11 years ago

Currently Qless supports throttling at a per-queue level. We have a need to due throttling on an arbitrary named resource (in our case a MySQL host in our shard ring). To prevent our MySQL hosts from getting overloaded, we've set a hard connection limit of 30 connections for our shard building jobs. We rescue and retry "too many connections" errors, but it would be more efficient if we could set a max concurrency per host, w/o having to put jobs in a per-host queue.

So...here's an idea for how we could refactor the current concurrency throttling to be more general:

In our use case, we would use MySQL host names as our throttable resources. This could supercede the existing per-queue throttling (as a queue name would be an implicit throttled resource and this could easily support that use case). It would also nicely support per-job-class throttling.

Thoughts, @dlecocq?

/cc @proby

databus23 commented 11 years ago

This just made my day. I like the idea a lot.I think it generalizes the throttling in a very useful way allowing much finer grained control over concurrency when jobs are using external resources.

wr0ngway commented 11 years ago

I like it as well. +1 to storing the list of jids instead of just a count Also, you mention a resource is released "When a job completes, fails or times out", are retries considered a fail, or will the resource be released while a job is waiting for its retry period to pass?

myronmarston commented 11 years ago

+1 to storing the list of jids instead of just a count

Yeah, the more I think about it, the more I like it being a set (not a list) of jids. If we used the counter and had a "counter leak", it doesn't provide the data to be able to troubleshoot what jobs are holding the resource. A jid set gives the you the details to be able to inspect all the jobs holding that resource.

Also, you mention a resource is released "When a job completes, fails or times out", are retries considered a fail, or will the resource be released while a job is waiting for its retry period to pass?

I think the job's jid should be in the resource set only while it is in the running state.

wr0ngway commented 11 years ago

On Aug 14, 2013, at 11:14 AM, Myron Marston wrote:

I think the job's jid should be in the resource set only while it is in the running state.

Agreed

stuartcarnie commented 10 years ago

I'm going to create a fork of qless-core and implement this feature. Any feedback on the proposed implementation welcomed.

Introduce a new class called QlessResource using the following keys

ql:r:[id]-jids A sorted set of job identifiers requiring the specified resource

ql:r:[id]-pending A sorted set of job identifiers waiting on the specified resource becoming available. The is used when existing jobs are releasing a resource to assign to the next job and move it to the -locks set

ql:r:[id]-locks A set of job identifiers which have an active lock for the specified resource

ql:r:[id] A hash table identifying various properties for this resource. The only key is max, which indicates the maximum usage available

Extend the put command so that an additional options parameter can be included for specifying an array of resource identifiers. If the resources key is present, call the Qless.resource(id):acquire(jid) API for each resource. The acquire API will either add an entry to the ql:resource:[id]-locks or ql:resource:[id]-pending sets depending on the availability. The set will be sorted based on priority to ensure correct FIFO ordering.

Extend the pop command, so invalidated locks release their resources and scheduled or recurring jobs that require resources appropriately acquire them before being added to the work queue, otherwise they are added to the ql:resource:[id]-pending set.

Extend the complete command, so that if the completing job has active resource locks, releases them, assigning them to the next pending job and moving it to the ql:q:[name]-work set if it has successfully acquired all required resources.

Extend the fail and retry commands, so they release active resource locks when a job transitions from the running state, and enqueue pending jobs waiting for the specified resource.

The patch will be documentation of the implementation details, but I'll be conscious of keeping things as efficient as possible.

wr0ngway commented 10 years ago

Was there ever a PR for this? I would love to see it added to master, so let me know if there is something I can do to help.