Open myronmarston opened 10 years ago
The reason this doesn't exist is because of scheduled / recurring jobs.
And the inefficiency is very manageable. As part of a recent investigation I ran some benchmarks against a snapshot of one of your production redis servers and peek
cost about 250 microseconds of actual redis time.
Longer polling intervals also don't hurt as much as you might think. All they affect is the amount of time jobs wait in queues, but generally not throughput. In the specific cases of the ruby and python clients, if a worker successfully pops a job, it will try popping another after it completes that job. It's only when a queue has no jobs available that the sleep interval is invoked. So for busy queues, the penalty for polling is very small.
Fortunately, most pipelines are robust against additional latency, especially for the difference between say, 10 seconds or 60 seconds. Or 5 minutes, for that matter. If you have 100 worker processes and a 5-minute polling interval, on average a worker will pop every 3 seconds. If you then dump a lot of jobs in those queues, the workers will all be busy within 5 minutes and stay busy until all the jobs are done. So all that interval will have changed is that the whole giant job batch will finish a couple minutes later than it normally would have, with 1/30th the number of pop requests.
Can we break the current worker mode? My situation is: I have the jobs need to be processed and then sent some information to other part via http post. We have two modes: The sync mode: A worker picks up set of jobs from the queue. For each job, worker sends a http request and then wait for http response and then mark the job is completed. After all jobs are finished, the worker picks other set of jobs from the queue. Then repeat the previous steps. The async mode: The worker picks up set of jobs from the queue. For each job, worker sends async http request without waiting for http response. After sending all jobs, the worker picks up other set of jobs from the queue and send http request. When the http response hits back, the callback from async http library will mark job is completed.
The sync mode fits to qless. However, we have to fork lot of workers for high volumes jobs. Sound the scalability is not good. Not sure asyn mode fits to qless or not since in this case, the previous jobs are not done yet but the worker picks other set of jobs. However, you can image that in this way, we do not need to fork so many workers to handle high volume jobs.
My question is that Can we use the async mode with qless?
(Copied from a mailing list thread):
Hi Yan, there are a couple of options here.
The first is that you can avoid using one of the built-in qless workers. Then you'd write some code that essentially called pop
to get some jobs, instantiate the corresponding HTTP requests, only calling the corresponding complete
call when the HTTP request has finished.
That said, this sounds more like a new worker model. Where the ruby client has a forking
and serial
worker, the python client has a forking
, serial
, and greenlet
(Fiber
s in ruby parlance) modes. This is ideal for exactly your use-case -- where you don't want to pay the overhead of additional processes just to get your work done.
As one data point, many of our crawlers use the python qless client, often with about 8 processes and 50 greenlets per process. Depending on how lightweight your requests are, there's no reason you couldn't use hundreds or thousands of greenlets in a single process. Though I'm less familiar with the Fiber-compatible HTTP-request libraries of ruby, I imagine that similar behavior is attainable.
Ultimately, my suggestion is to make a new worker which inherits from the forking worker, but each child process uses either a thread- or fiber-pool to run each job. That way you still get the same semantics of pop-run-complete for each job, but without the overhead you were hoping to avoid. Such a worker would be welcome in the qless repo :-)
Hi! Dan,
Thank you for replying. Our project is using java. So we have threads as workers. We do not fork the new process. The scalability for the first solution is very good. However, in this case, a worker (thread) can pick up many many jobs without finishing them. The UI provided by qless could show that lot of jobs under a worker. Also, the "complete" a job needs to pass in the worker's name. I am not sure if it is unrelated to the real application worker (thread, process etc) or not. For example, if the real application thread (worker) is stuck. The qless needs to pass the job to other thread (worker).
Based on your suggestion, we can have the thread (worker) which pulls the jobs. Then it dispatches all http requests from the jobs to async http library. After dispatched, it waits for the reply from HTTP server (callback from asyn http library) and update the qless job status. This is better than synch mode but slightly worse than the first one in scalability.
The worker
is actually essentially just a unique string identifying the entity that's doing the processing. This has been left intentionally vague so as to fit into many different paradigms. For instance, the ruby bindings use the worker name including a hostname
and PID
combination, where the python code treats all worker processes as being on the same worker
. So, as long as you are passing in the same string for complete
as what you passed in on the pop
command that yielded a job, you'll be fine.
Given that you are tied to using callbacks, it seems reasonable to me to have one thread pop
and dispatch the requests and complete
them in the response callback. Bear in mind, all the job rules (like heartbeating) still apply, so correct configuration will likely be important.
Thanks a lot!
In our workers the general pattern is to to the following in a loop:
This is pretty inefficient and results in steady redis traffic even when there's no work to do.
I'd like to propose an alternate model:
sleep
(with no arg) to sleep indefinitely, until it's subscriber gets ajobs_available
message, at which time it would useThread#run
to wake up the worker.This would result in much less redis traffic and would allow workers to get jobs immediately when they are put on the queue rather than waiting through the 5 second (or whatever) sleep we currently use.
One "gotcha" with this, though: with scheduled/recurring jobs, jobs are put on the queue when a worker calls
pop
orpeek
, causing qless to make the state of that stuff consistent. so if nothing callspop
orpeek
, it'll never move the scheduled job to the waiting state and never notify workers. Thus, we may want to do something like use a long sleep (rather than an indefinite one) or consider having the parent process callpeek
periodically.