Provide a sidekiq-style worker as an alternative to the resque-style one

myronmarston commented 12 years ago

@jstorimer and I were talking about this on twitter. I haven't looked much at the sidekiq code yet but wanted to get a conversation rolling here.

jstorimer commented 12 years ago

After a cursory glance, this isn't a trivial drop-in solution if you want to support both styles of worker.

The primary difference between the resque worker and the sidekiq worker is processes -> threads. This requires some core changes to qless to make this possible:

Job classes need to define a #perform instead of a .perform. The latter discourages thread-safety.
Since multiple threads would otherwise share the default Client connection, all redis interactions need to be proxied through a connection pool. This way each thread gets its own connection.

The first one affects the public API, the second one doesn't have to.

That being said, the qless feature set shouldn't be affected and there's no reason why qless workers couldn't see the same performance boost that sidekiq workers offer (often 10x over resque workers).

Thoughts from the qless team? @mperham anything to add?

mperham commented 12 years ago

Threads vs Processes is a pretty fundamental architecture decision and, as you note, has a major influence on APIs. The readme explicitly states that a worker is a process. I think it would be pretty difficult to introduce threading without a major overhaul of the APIs.

myronmarston commented 12 years ago

@jstorimer -- thanks for continuing this discussion!

Job classes need to define a #perform instead of a .perform. The latter discourages thread-safety.

It's not clear to me why this change needs to be made. When you're using threads you need to avoid mutating state shared between threads (or use a synchronization mechanism). That doesn't change if the worker is calling a class method or a instance method.

@dlecocq (the creator of Qless) and I discussed the instance vs class method thing when we built the ruby client. It was an intentional decision to use a class method. My argument at the time is that when you have a simple job that has no state (beyond the passed-in-args) whatsoever, you don't really need to instantiate an instance of a class to do your work. It's an extra object allocation that presumably creates more objects that need to be garbage collected. If you've got more state you need to manage, then you obviously don't want to do so at the class level. On projects where I've used resque, many of my jobs have done something like:

class MyJob
  def self.perform(a, b)
    new(a, b).perform
  end

  def perform
    some_helper_method_that_depends_on_a
  end

private

  def some_helper_method_that_depends_on_a
    # use a
  end
end

When a job is more than a couple lines of code, this is the obvious thing to do (to me at least--but maybe I structure my code differently than other devs?).

@mperham I admit I haven't dealt with concurrency nearly as much as you have, but all the ways I can think that you could create race conditions and other problems w/ a .perform would still be problematic if the method was #perform -- e.g. if you hang some state off of a class, and change it in your job, you're going to have problems regardless of the perform method being a class or instance method.

Can you show an example of a case where using .perform rather than #perform would cause a problem?

The first one affects the public API, the second one doesn't have to.

I'm not opposed to changing the API if I become convinced we need to. It would be relatively painless in our projects that use Qless, and I'm not aware of any production usage of Qless outside of SEOmoz (although, there certainly may be some now). We could also ease the migration pain by deprecating .perform (but still allowing it to work if it is defined) in favor of #perform.

Since multiple threads would otherwise share the default Client connection, all redis interactions need to be proxied through a connection pool. This way each thread gets its own connection.

The redis gem's README states that as of redis-rb 2.2.0, the gem is threadsafe by default. Is the need for a connection pool driven by commands like multi, where you could get into trouble if one thread started a multi block and than another thread executed commands but didn't expect it to be in a multi block?

Any pointers (from either of you) on how best to implement a redis thread pool without re-inventing the wheel? innertube looks like an interesting possibility there.

Alternately, the fact that Qless does not have a global redis connection may help us here...where as Resque makes you assign a global connection to the Resque module, Qless uses a separate redis connection on each Qless::Client instance, and there aren't any global client instances. Would it make sense to use a different Qless client instance in each thread?

The readme explicitly states that a worker is a process. I think it would be pretty difficult to introduce threading without a major overhaul of the APIs.

The README states that a worker is a process because that's the way the current worker is written. In Resque (and maybe in Sidekiq?) most of the "smarts" of the system reside in the worker, and re-architecting the resque worker would probably be a major undertaking. In Qless, the "smarts" of the system reside in the lua scripts running on the redis server. The worker was one of the last things we added to the ruby gem, and I had plans from the start to also provide a sidekiq-style threaded worker. It was far easier to start w/ the resque-style worker since I was so familiar with it.

mperham commented 12 years ago

I was mostly speaking of connection handling when talking about API changes, not perform. It's easy to just "open a socket" when single threaded.

Sidekiq uses my connection_pool gem to provide a fixed set of Redis connections.

jstorimer commented 12 years ago

When you're using threads you need to avoid mutating state shared between threads (or use a synchronization mechanism). That doesn't change if the worker is calling a class method or a instance method.

True. I also prefer the style you mentioned, to the point where I'd rather write my jobs that way (in fact, at Shopify we have an abstraction layer between our jobs and the queueing library so we can write them like this).

There's no need to change the API but a .perform increases the possibility of creating shared state. This is more of a public education issue than anything, the qless job API doesn't need to change.

The redis gem's README states that as of redis-rb 2.2.0, the gem is threadsafe by default. Is the need for a connection pool driven by commands like multi

My understanding is that the pool of connections provides concurrency, rather than safety. 25 threads using the same connection will probably end up creating lock contention. But your point about each Client having its own connection makes this a non-problem. Each thread gets its own Client, done.

In Qless, the "smarts" of the system reside in the lua scripts running on the redis server.

I definitely got that feeling from reading through the source.

The worker was one of the last things we added to the ruby gem, and I had plans from the start to also provide a sidekiq-style threaded worker. It was far easier to start w/ the resque-style worker since I was so familiar with it.

Cool. I'll try to spike this out then. @mperham any attempt at a sidekiq-style worker here would obviously be inspired by what you have in sidekiq. Does this clash with the license in any way?

mperham commented 12 years ago

I can't imagine how it would conflict unless you are literally copying Sidekiq code. Ideas aren't copyrightable, code is.

dlecocq commented 12 years ago

A little late to the party, but thought I'd chime in.

FWIW, the python implementation takes the tack that a master process forks off workers, each of which run essentially runs a pop-perform loop. You can also ask that certain libraries are imported before the fork with hopes of making use of COW, but that's not always necessary. In practice, I've not encountered any RAM-related problems, but that might just be the nature of the jobs as opposed to Ruby vs. Python's memory consumption.

We also have a branch that runs workers inside green threads (a lightweight software thread) for use with libevent (for IO-bound jobs). In order to get that to work, I actually had each green thread create its own connection. The problem I had was that the workers were receiving responses from Redis meant for other workers. Despite the python Redis client not being safe for use with greenlets, the one-worker-one-client approach has been working fine.

Also, the python client uses the equivalent of the self.perform syntax -- staticmethods.

As someone who doesn't use the Ruby client very extensively (or at all), I don't really have a horse in this race. It seems to me, though, real KLT threads > serially running jobs per fork >>> one job per fork.

myronmarston commented 12 years ago

As someone who doesn't use the Ruby client very extensively (or at all), I don't really have a horse in this race. It seems to me, though, real KLT threads > serially running jobs per fork >>> one job per fork.

FWIW, I think there are benefits to both the threaded sidekiq-style worker and the process-based resque-style worker. I think it's a great feature of Qless to support both. Specific pros/cons I see:

A threaded worker will generally support higher throughput with less resource consumption (as Sidekiq has proved). However, if you're using an interpreter that has a GIL (i.e. MRI -- which we use at SEOmoz) and your jobs are primarily computation, not I/O, then the threaded worker isn't going to provide much benefit (if any) over a process-based one.
Resque's fork-per-job model has a great benefit of providing total memory isolation in every job--meaning that you don't need to worry about memory leaks since any memory leaks will be short lived (i.e. for the duration of one job).
A threaded model is more prone to race conditions if the developers working on the system don't avoid mutable shared state or use proper synchronization. It's less effort/thought to use a process-based worker (this may encourage lazy thinking, though).

So I think it'll be pretty great for Qless to support both.

seomoz / qless

Provide a sidekiq-style worker as an alternative to the resque-style one #44