mperham / girl_friday

Have a task you want to get done but don't want to do it yourself? Give it to girl_friday!
http://mperham.github.com/girl_friday
MIT License
606 stars 26 forks source link

Move conversation from Blog #3

Closed mrrooijen closed 13 years ago

mrrooijen commented 13 years ago

Hey, this is Michael (the guy who posted on your blog regarding Girl Friday and RBX/Rainbows, etc)

Thought it might be easier to keep this project-specific talk on GitHub rather than on your blog, might clutter it up and get lost over time :)

Anyway to pick up where we left off:

I tried Thin + Ruby 1.9.2 - this seems to work. It queues and processes data in the background. In development mode, while jobs are being processed, and spamming the refresh key (to simulate "heavy" load) I occasionally get the error where it says it expected the User model to be defined in app/models/user.rb - I'm suspecting this is because it doesn't store the models in memory in development mode and it conflicts when I refresh and it processes. I tried to replicate this in production mode and was unable to, so im fairly sure that's just because it doesn't store the models in memory in production and reloads on each request.

When running thin --threaded under RBX it simply fails. It hangs infinitely as far as I can see. So that's a no-go. thin --threaded under Ruby 1.9.2 does not improve performance at all (unless there are some I/O blocking things going down somewhere of course, then it's a good improvement to speed up requests) but it actually does work and is as fast as without --threaded

Running Thin without --threaded with RBX does not (as you already thought) enable threading. Performance actually gets a few times worse. Rainbows! hangs when you do the initial request, or it takes like 30-60 if you'er patient enough to wait, but it's the same with each request. I checked the mailing list and noticed your post already. Seems like it's more of a RBX or RVM issue? In any case, with RVM + RBX Hydra it seems like it doesn't work at all.

So Unicorn, Rainbows, Thin, Passenger all don't seem to work. What's left?

mrrooijen commented 13 years ago

I actually just pushed an app to Heroku but it seems they do not allow (some how) you to spawn multiple threads. Picky! Same goes for processes.

mperham commented 13 years ago

Thanks for following up; this is great info to have. Yes, rails's autoloading will cause issues when rapidly reloading. Running in production mode disables autoloading so you won't see those issues.

I can't get Rainbows 3.2.0 working with rbx 1.2.4dev or ruby 1.9.2. There's a Rubygems issue. rbx 2.0.0dev crashes on me so I'm backing off my Rubinius recommendation for now.

Thin + 1.9.2 works for me also.

mrrooijen commented 13 years ago

Yup. Hopefully Rubinius will work at a later time, at least when 2.0.0 comes out. Looking forward to that. I just tried JRuby and Thin --threaded, seems to be able to take a good amount of requests. But it does (imo) leak a huge amount of memory. ab -n 10000 -c 100 and it goes from 150mb to 400mb, and keeps increasing. Though, this might be because of Thin. I have no clue how to operate Java-based App Servers ha. I'll have to check that out at some point. Got any recommendations which are fairly easy to work with to deploy JRuby apps to?

mrrooijen commented 13 years ago

Well even on Ruby 1.9.2 I find that GF quite performant. I mean, it sure beats any non-concurrent worker like DJ, Resque, Navvy, etc. You get workers for free, and more than 1, for the same amount of memory. And as you pointed out, since most server/apps are I/O heavy, rather than CPU heavy it'll on block for an extremely short time, i don't even notice it's blocking. If CPU-intensive stuff comes in to play then you could always spawn a different worker to pick it up, preferably not though. Hopefully Rubinius 2.0.0 (or any earlier dev version) will resolve this issue, or jump on JRuby etc.

I feel that the gap between MRI and JRuby however is quite big. I've been trying to figure out how to get up and running with JRuby a few times but I barely have any Java experience, let alone Java App Server experience so I have no clue (as I think many other people) where to look for a Thin, Mongrel, Passenger, Unicorn, Rainbows equivalent in Java. Not many helpful resources (from what I can see) out there either. That's why I was hoping Rubinius would just work haha. But all in due time.

Really excited about using Girl Friday regardless, great improvement over existing solutions! Let me know if you find anything over time that'll make RBX work play nice with a Ruby App Server. Or, if you know any easy-to-use JRuby app servers let me know. I'm willing to try JRuby with Rails if there's any relatively easy app server to use.

mrrooijen commented 13 years ago

Hey, did you see this? http://torquebox.org/ the name flew by a few times on twitter and I was like: whats this about. No idea if this is new but i haven't heard of it before. And you? Benches i saw earlier looked pretty nifty and it seems to be built upon JBoss app server.

Taken from their website:

"TorqueBox is a new kind of Ruby application platform that integrates popular technologies such as Ruby on Rails, while extending the footprint of Ruby applications to include built-in support for services such as messaging, scheduling, and daemons."

"TorqueBox provides an all-in-one environment, built upon the latest, most powerful JBoss AS Java application server. Functionality such as clustering, load-balancing and high-availability is included right out-of-the-box."

mperham commented 13 years ago

Yep, I was going to suggest, for JRuby, TorqueBox or GlassFish.

I just got Rainbows working with my test app. I was missing "gem 'rainbows'" in the Gemfile, which apparently is required. girl_friday worked just fine with rbx 1.2.4 + rainbows.

mrrooijen commented 13 years ago

Cool. Though, that means there is no support for GIL-less GC yet? Meaning it won't be better than MRI 1.9.2 in terms of concurrency/threading?

mperham commented 13 years ago

True, jruby is currently your best choice for scaling. Rbx 2.0 (hydra) is still too unstable in my testing for real world use. 192 should be good enough unless you are CPU bound.

On Saturday, April 30, 2011, meskyanichi reply@reply.github.com wrote:

Cool. Though, that means there is no support for GIL-less GC yet? Meaning it won't be better than MRI 1.9.2 in terms of concurrency/threading?

Reply to this email directly or view it on GitHub: https://github.com/mperham/girl_friday/issues/3#comment_1081678

mrrooijen commented 13 years ago

Cool. I personally find Passenger easy to set up in most cases, however, it seems that it doesn't support threading (probably because of its nature of spawning and killing processes). I really like Girl Friday and I was wondering if there was a way to still use Girl Friday even though I use Passenger.

Is it possible to queue up jobs and persist them to Redis even though threading is disabled in Passenger? Because then this would enable us to still use Rubinius or JRuby a lot easier by just having a dedicated instance (non-Rack) that deals with hardcore job processing. This partially goes against the concept of Girl Friday by minimizing memory consumption by doing everything in a single app server instance. However, say you have the following setup:

Then the web-serving and job queuing will be done by passenger cluster. We also have a separate (custom) Ruby file. Probably just some kind of infinite loop loop { sleep 1 } which never exits. Then we could load in Girl Friday and process all the jobs in threads here by pulling them in from Redis. This'll enable the usage of Rubinius Hydra or JRuby regardless of what web server you're using. Of course, it'll mean you'd have to spawn a dedicated instance, but I don't think there's anything wrong with that since if you're able to use Rubinius Hydra or JRuby that'll be able to process a lot of data concurrently this way, while consuming significantly less RAM than other workers and you'll also be able to use any app server you'd like with Girl Friday since it's not dependent on that.

Would something like this be possible with Girl Friday? Or would queuing jobs require the web server to allow threading?

mperham commented 13 years ago

Interesting use case. It's definitely not part of my vision to create a standalone girl_friday worker process. I did that already with Qanat (https://github.com/mperham/qanat). Keep in mind that you can use Threads with Passenger, but with several caveats. Passenger will only send one request at a time per process and Passenger assumes the process is quiet (and can be killed) once a response has been returned, making graceful shutdown somewhere between tough and impossible. I'm not sure if it is possible to work around this or add it.

I can see two parts:

1) Your application would declare Redis-backed queues like normal, but with :size => 0 so you have no workers. Supervisor would get the message and send it to Redis. Because there's 0 workers, it would not be processed within your application's Passenger process.

2) You could then roll your own simple Sinatra or plain Ruby process which declares the same queues but with actual workers. The problem is that the supervisor hands the work to the workers but since it never receives any work, it would not hand anything out. You'd need to modify the supervisor code to poll Redis or use Redis's pub/sub (http://redis.io/commands#pubsub) commands. I haven't used those Redis commands before so I'm not familiar with how well they'd work in reality.

mrrooijen commented 13 years ago

Actually you can set the passenger start --min-instances=3 --max-pool-size=3 (basically a cluster of 3 instances that never get killed) But when I tried to enqueue some jobs with Passenger (in memory) it would not actually process the job. Am I missing something?

In any case, I think polling would be sufficient since it's all in-memory so it should be super fast and won't really have a performance impact compared to a standard database query. This seems like an interesting way to deal with the situation though. I'd like to enqueue stuff from my web app, to be further processed in parallel from a stand-alone Ruby file that just includes the code necessary for the jobs to be processed (or maybe just the whole config/application.rb to simplify things). With that in place using RBX Hydra to enable the GIL-less threading to process a lot of data concurrently.

I'll try to set up a test rails app with Girl Friday and Redis as store and see if it actually persists jobs. Cause with the memory store it didn't seem to process the jobs at all.

mrrooijen commented 13 years ago

Mike, you mentioned you had a MRI 1.9.2 + Rainbows set up and working properly.

I'm wondering what your configuration looks like? I checked out their docs and I've been trying various combinations but I'm not sure what a proper configuration would be. I always see my Google Chrome spinner "spinning", see: http://cl.ly/2M0i1j2K3W1m240v0K22 - on some configurations it seems to spin infinitely, and some it spins for like 5-10 seconds but then stops. However, the pages do load fast. Is this bad?

This is my config:

worker_processes 2 # cpu cores
Rainbows! do
  use :ThreadPool # your recommended strategy
  worker_connections 10 # no idea what this is
end

The server I'll be running on will have 2 cores on KVM. Using Rainbows and I'd like to spawn 3 app instances and my GirlFriday queue sizes will be around 5 to 10. Is there any configuration you could recommend? Ty

mperham commented 13 years ago

I've heard that Rainbows doesn't work well in practice (I've just tested it locally with a very simple test case) and that Unicorn actually works better. I'm going to close this issue since there's no real point to a never-ending conversation - open another issue if you have specific bugs or issues that still need work.