technoweenie / coffee-resque

MIT License
546 stars 44 forks source link

Why not to use blocking redis operators insted of pooling? #16

Closed lexer closed 12 years ago

lexer commented 13 years ago

May be it would be better to use blpop insted of lpop with pause?

steelThread commented 13 years ago

That would be fine if we only supported one worker instance or perhaps multiple worker instances consuming the same queue for jobs. However that isn't the case. In a single node process there can be worker instances working off of different queues. Blocking on one list locks out workers consuming jobs off the other lists. So in resque's case what you bring up is going to hinder concurrency.

Make sense?

lexer commented 13 years ago

In terms of redis it won't block the set. As soon as one process will pop a job another process can pop job. The only limitation that i faced is that each worker should have its own redis client connection.

There is a project called "pace". It implements resque workers based on Event Machine. Look at its pooling implementation https://github.com/groupme/pace/blob/master/lib/pace/worker.rb

Also to limit concurrency they implement nice throttling approach.

steelThread commented 13 years ago

Cool, I'll check out pace. I think everyone agrees that polling isn't an ideal solution. Perhaps if each worker instance had their own redis connection then a blocking pop might work. I'm not convinced that this is the way to go though given the current design. In fact I don't think that will actually work without other rework.

So the current throttling approach that we have is based on the # of workers (consumers) you instantiate and not based on a rate (5msg/sec). Each worker instance will be working on at most 1 job at any given time. This is a pretty common technique in messaging frameworks for scaling and tuning resource utilization, ie increase/decrease the # of consumers. I've used this with my coffee-resque solutions to tune node processes in order to ensure high CPU utilization and then replicated this across a cluster of VMs that were dedicated as resque workers. It was really easy for me to determine the optimal # of worker instances that I needed quickly in order to achieve my utilization goals. And we only need a single shared redis connection.

I'm not hard set on the current design but it works really good, is efficient, and doesn't require blocking io. It does however have the potential for performing lpop operations even when no msgs are available. However those ios are so damn fast it doesn't concern me. I could be convinced on another approach with some good arguments on why the current design is inferior to something else.