sidekiq / sidekiq

Simple, efficient background processing for Ruby
https://sidekiq.org
Other
13.15k stars 2.41k forks source link

Sidekiq with Mongoid/Kiqstand and Connection Pooling #1526

Closed daveharris closed 10 years ago

daveharris commented 10 years ago

Hi Mike,

We are running Sidekiq 2.17.7 in production running on ruby 1.9.3 with 25 workers doing mainly RestClient calls and processing the results. These get put into a Mongo database using Mongoid v3.1.3. I has been working great, until ...

Last year we noticed that we were getting errors which looked like DNS failures - we tracked this down to not enough file handles. We increased the amount to 64,000 as recommended by MongoDB. This has the unfortunate result of increasing the number of Mongo connections to 80% of 64,000 (with a max of 20,000).

After a few months we hit the 20,000 connections in Mongo and starting getting errors from that. This all boils down to the fact that when a Sidekiq job finishes, it doesn't clean up the connection.

As I'm sure you are aware, this is a known issue with Sidekiq and Mongoid - so Kiqstand was created. We installed this thinking that we were golden - however we now have a big performance issue (jobs are processed ~5x slower now) as Sidekiq presumably creates a Mongoid connection for every job. (What doesn't quite compute is why it's now slower than before when old connections were left around and new ones were created every time)

What we need is a connection pool (which I'm sure you know more than a little about). Mongoid/Moped is introducing a connection pool in version 4, but that requires Rails 4 which ideally needs Ruby v2. We will be moving there in the next few months when Mongoid v4 is stable, but we something for now.

Sorry to be talking so much about Mongoid when I know you aren't the maintainer but I would really love your opinion on what we could do.

I had thought you removing Kiqstand, and using manual connection pools using your connection_pool gem, but we have many different types of workers and the code changes would be to be far-reaching - only to be backed out when Mongoid actually gets connection pooling. Would it be easier to use something like MongoMapper which I understand uses a driver which supports connection pooling?

Can you give some advice about how best to achieve connection pooling? Sorry to create an issue for a non-bug.

Thanks for all the hard work on Sidekiq - apart from the Mongo connection issue it has been fantastic for about a year. We are hoping to upgrade to Pro very soon!

Thanks, Dave

jonhyman commented 10 years ago

Get off kiqstand and monkey patch celluloid. It's not good technology created out of incorrect understanding. See my post at https://groups.google.com/forum/m/#!searchin/mongoid/Kiqstand/mongoid/8rpSlgsRSSc.

I had submitted a pull to update mongoids website and the Readme for sidekiq. Not sure if it was accepted.

Sent from my mobile device On Mar 2, 2014 5:53 PM, "Dave Harris" notifications@github.com wrote:

Hi Mike,

We are running Sidekiq 2.17.7 in production running on ruby 1.9.3 with 25 workers doing mainly RestClient calls and processing the results. These get put into a Mongo database using Mongoid v3.1.3. I has been working great, until ...

Last year we noticed that we were getting errors which looked like DNS failures - we tracked this down to not enough file handles. We increased the amount to 64,000 as recommended by MongoDB. This has the unfortunate result of increasing the number of Mongo connections to 80% of 64,000 (with a max of 20,000).

After a few months we hit the 20,000 connections in Mongo and starting getting errors from that. This all boils down to the fact that when a Sidekiq job finishes, it doesn't clean up the connection.

As I'm sure you are aware, this is a known issue with Sidekiq and Mongoid

  • so Kiqstand https://github.com/mongoid/kiqstand was created. We installed this thinking that we were golden - however we now have a big performance issue (jobs are processed ~5x slower now) as Sidekiq presumably creates a Mongoid connection for every job. (What doesn't quite compute is why it's now slower than before when old connections were left around and new ones were created every time)

What we need is a connection pool (which I'm sure you know more than a little about). Mongoid/Moped is introducing a connection pool in version 4, but that requires Rails 4 which ideally needs Ruby v2. We will be moving there in the next few months when Mongoid v4 is stable, but we something for now.

Sorry to be talking so much about Mongoid when I know you aren't the maintainer but I would really love your opinion on what we could do.

I had thought you removing Kiqstand, and using manual connection pools using your connection_pool gem, but we have many different types of workers and the code changes would be to be far-reaching - only to be backed out when Mongoid actually gets connection pooling. Would it be easier to use something like MongoMapper which I understand uses a driver which supports connection pooling?

Can you give some advice about how best to achieve connection pooling? Sorry to create an issue for a non-bug.

Thanks for all the hard work on Sidekiq - apart from the Mongo connection issue it has been fantastic for about a year. We are hoping to upgrade to Pro very soon!

Thanks, Dave

Reply to this email directly or view it on GitHubhttps://github.com/mperham/sidekiq/issues/1526 .

jonhyman commented 10 years ago

I mean a pull for Kiqstand's Readme.

Sent from my mobile device On Mar 2, 2014 5:58 PM, "Jonathan Hyman" hyman.jon@gmail.com wrote:

Get off kiqstand and monkey patch celluloid. It's not good technology created out of incorrect understanding. See my post at https://groups.google.com/forum/m/#!searchin/mongoid/Kiqstand/mongoid/8rpSlgsRSSc.

I had submitted a pull to update mongoids website and the Readme for sidekiq. Not sure if it was accepted.

Sent from my mobile device On Mar 2, 2014 5:53 PM, "Dave Harris" notifications@github.com wrote:

Hi Mike,

We are running Sidekiq 2.17.7 in production running on ruby 1.9.3 with 25 workers doing mainly RestClient calls and processing the results. These get put into a Mongo database using Mongoid v3.1.3. I has been working great, until ...

Last year we noticed that we were getting errors which looked like DNS failures - we tracked this down to not enough file handles. We increased the amount to 64,000 as recommended by MongoDB. This has the unfortunate result of increasing the number of Mongo connections to 80% of 64,000 (with a max of 20,000).

After a few months we hit the 20,000 connections in Mongo and starting getting errors from that. This all boils down to the fact that when a Sidekiq job finishes, it doesn't clean up the connection.

As I'm sure you are aware, this is a known issue with Sidekiq and Mongoid

  • so Kiqstand https://github.com/mongoid/kiqstand was created. We installed this thinking that we were golden - however we now have a big performance issue (jobs are processed ~5x slower now) as Sidekiq presumably creates a Mongoid connection for every job. (What doesn't quite compute is why it's now slower than before when old connections were left around and new ones were created every time)

What we need is a connection pool (which I'm sure you know more than a little about). Mongoid/Moped is introducing a connection pool in version 4, but that requires Rails 4 which ideally needs Ruby v2. We will be moving there in the next few months when Mongoid v4 is stable, but we something for now.

Sorry to be talking so much about Mongoid when I know you aren't the maintainer but I would really love your opinion on what we could do.

I had thought you removing Kiqstand, and using manual connection pools using your connection_pool gem, but we have many different types of workers and the code changes would be to be far-reaching - only to be backed out when Mongoid actually gets connection pooling. Would it be easier to use something like MongoMapper which I understand uses a driver which supports connection pooling?

Can you give some advice about how best to achieve connection pooling? Sorry to create an issue for a non-bug.

Thanks for all the hard work on Sidekiq - apart from the Mongo connection issue it has been fantastic for about a year. We are hoping to upgrade to Pro very soon!

Thanks, Dave

Reply to this email directly or view it on GitHubhttps://github.com/mperham/sidekiq/issues/1526 .

mperham commented 10 years ago

@daveharris I literally know nothing about the Mongo space. I'll let @jonhyman speak for Sidekiq since he's an expert in both areas.

daveharris commented 10 years ago

Hi @jonhyman,

Yes I have read your gist - I wanted to do that but was put off by the Monkey Patching. Would you suggest that we Monkey Patch until Mongoid v4 actually supports connection pooling?

My understand of your Monkey Patch is to store to Mongoid connection at the level of a Fiber, rather than a Thread - is that correct? I see talk of running in production with the Ruby 1.9.3 version, I assume performance therefore isn't an issue? (what we are trying to address!)

Also, what do you mean by "Just be sure that you write your own middleware to clear the IdentityMap if you enable it in production.". You are saying I need a middleware for just line 22 of Kiqstand?

Thanks Dave

PS. Sorry to involve you int he conversation @mperham, there seems to be a lot of talk around the issue and trying to figure out what the best option is

mperham commented 10 years ago

@daveharris No problem. I hope you and @jonhyman can figure out a solution; I'm sure there are others with the same issue.

daveharris commented 10 years ago

Looks like I'm having the same problem as #912. Sidekiq caught in the cross-fire yet again :P

jonhyman commented 10 years ago

Hey @daveharris, I'm on my phone so sorry that I might be brief and not link to lines of code.

Moped and Mongoid store information about the session in Thread.current thread-local variables which are local to Fibers (each Fiber has a different variable stack not visible to each other). Celluloid uses fibers around its threads, so when one terminates, the session also terminates. Sidekiq never did "not clean up its workers", at least not with Ruby 2.0. If you open up mongostat and run jobs without Kiqstand you'll the behavior I'm talking about here. Connections will open as soon as the job starts and close as soon as the job finishes.

What the gist does is move those sessions to thread local variables that are shared across the fibers using new Ruby 2.0 thread features. Now connections will stay open, one connection per concurrency set in Sidekiq. Look again at mongostat, as many connections will be created for your concurrency level, and they will not disconnect nor have any more connections opened.

We run this in production with decently sized load. Eg last time I checked the new relic graphs last week we had some peak load of about 150,000 jobs per minute. That range is pretty regular for us. I don't know about 1.9.3 issues since we don't run it. You can't use the newer thread methods but the original author of the gist used a hash so it doesn't seem required.

In terms of production performance things got better after this patch, confirmed by our dbas at ObjectRocket. They definitely thought this as an improvement over the health of mongodb and we saw job throughput increase slightly.

I know that this is a monkey patch and in general I am very bearish on them but if you understand the code I think this one is reasonable. My stance is to do it until Mongoid 4. That's my current plan.

Sent from my mobile device

Hi @jonhyman https://github.com/jonhyman,

Yes I have read your gist - I wanted to do that but was put off by the Monkey Patching. Would you suggest that we Monkey Patch until Mongoid v4 actually supports connection pooling?

My understand of your Monkey Patch is to store to Mongoid connection at the level of a Fiber, rather than a Thread - is that correct? I see talk of running in production with the Ruby 1.9.3 versionhttp://avi.io/blog/2013/01/30/problems-with-mongoid-and-sidekiq-brainstorming/#comment-1026783827, I assume performance therefore isn't an issue? (what we are trying to address!)

Also, what do you mean by "Just be sure that you write your own middleware to clear the IdentityMap if you enable it in production.". You are saying I need a middleware for just line 22 of Kiqstandhttps://github.com/mongoid/kiqstand/blob/master/lib/kiqstand/middleware.rb#L22 ?

Thanks Dave

PS. Sorry to involve you int he conversation @mperhamhttps://github.com/mperham, there seems to be a lot of talk around the issue and trying to figure out what the best option is

Reply to this email directly or view it on GitHubhttps://github.com/mperham/sidekiq/issues/1526#issuecomment-36471118 .

jonhyman commented 10 years ago

And yes, just write a quick Middleware to clear out the identity map. It should be almost identical to kiqstand, just get rid of the disconnect sessions line.

Sent from my mobile device On Mar 2, 2014 6:29 PM, hyman.jon@gmail.com wrote:

Hey @daveharris, I'm on my phone so sorry that I might be brief and not link to lines of code.

Moped and Mongoid store information about the session in Thread.current thread-local variables which are local to Fibers (each Fiber has a different variable stack not visible to each other). Celluloid uses fibers around its threads, so when one terminates, the session also terminates. Sidekiq never did "not clean up its workers", at least not with Ruby 2.0. If you open up mongostat and run jobs without Kiqstand you'll the behavior I'm talking about here. Connections will open as soon as the job starts and close as soon as the job finishes.

What the gist does is move those sessions to thread local variables that are shared across the fibers using new Ruby 2.0 thread features. Now connections will stay open, one connection per concurrency set in Sidekiq. Look again at mongostat, as many connections will be created for your concurrency level, and they will not disconnect nor have any more connections opened.

We run this in production with decently sized load. Eg last time I checked the new relic graphs last week we had some peak load of about 150,000 jobs per minute. That range is pretty regular for us. I don't know about 1.9.3 issues since we don't run it. You can't use the newer thread methods but the original author of the gist used a hash so it doesn't seem required.

In terms of production performance things got better after this patch, confirmed by our dbas at ObjectRocket. They definitely thought this as an improvement over the health of mongodb and we saw job throughput increase slightly.

I know that this is a monkey patch and in general I am very bearish on them but if you understand the code I think this one is reasonable. My stance is to do it until Mongoid 4. That's my current plan.

Sent from my mobile device

Hi @jonhyman https://github.com/jonhyman,

Yes I have read your gist - I wanted to do that but was put off by the Monkey Patching. Would you suggest that we Monkey Patch until Mongoid v4 actually supports connection pooling?

My understand of your Monkey Patch is to store to Mongoid connection at the level of a Fiber, rather than a Thread - is that correct? I see talk of running in production with the Ruby 1.9.3 versionhttp://avi.io/blog/2013/01/30/problems-with-mongoid-and-sidekiq-brainstorming/#comment-1026783827, I assume performance therefore isn't an issue? (what we are trying to address!)

Also, what do you mean by "Just be sure that you write your own middleware to clear the IdentityMap if you enable it in production.". You are saying I need a middleware for just line 22 of Kiqstandhttps://github.com/mongoid/kiqstand/blob/master/lib/kiqstand/middleware.rb#L22 ?

Thanks Dave

PS. Sorry to involve you int he conversation @mperhamhttps://github.com/mperham, there seems to be a lot of talk around the issue and trying to figure out what the best option is

Reply to this email directly or view it on GitHubhttps://github.com/mperham/sidekiq/issues/1526#issuecomment-36471118 .

daveharris commented 10 years ago

Hi @jonhyman,

Thank you so much for the explanation - I now understand.

I've implemented in development and seems to work exactly as you say - I can see mongostat connections go up and down by 25 when I start and stop sidekiq.

Thanks again - it's was super confusing with so many technologies in play and so much talk about the issue.

Dave

jonhyman commented 10 years ago

You're welcome! Best of luck in prod!

Sent from my mobile device On Mar 2, 2014 7:00 PM, "Dave Harris" notifications@github.com wrote:

Hi @jonhyman https://github.com/jonhyman,

Thank you so much for the explanation - I now understand.

I've implemented in development and seems to work exactly as you say - I can see mongostat connections go up and down by 25 when I start and stop sidekiq.

Thanks again - it's was super confusing with so many technologies in play and so much talk about the issue.

Dave

Reply to this email directly or view it on GitHubhttps://github.com/mperham/sidekiq/issues/1526#issuecomment-36472613 .

amitsaxena commented 9 years ago

The updated URL for the google group discussion is: https://groups.google.com/forum/#!topic/mongoid/8rpSlgsRSSc

And the gist can be found here: https://gist.github.com/markmeeus/6412088

And the one by @jonhyman (ruby 2.0) is here: https://gist.github.com/jonhyman/7751687