UDPSockets leaked under load

raykrueger commented 11 years ago

Ran into some issues today with our Sidekiq/Statsd setup. We have a nice sidekiq middleware component that wraps each job in a "time" block; it works great.

The problem is that, under load, garbage collection does not get to run and all of the UDPSockets attached to threads are left behind.

To remedy this I added a tweak to the middleware to clean up the UDPSocket after the job finishes.

class StatsdMiddleware
  def call(worker, msg, queue)
    statsd = #get statsd however
    statsd.time("#{worker.class}.perform") do
      yield
    end
  ensure
    socket = Thread.current[:statsd_socket]
    socket.close if socket
    Thread.current[:statsd_socket] = nil
  end
end

This is then added into Sidekiq as a server middleware...

Sidekiq.configure_server do |config|
  config.server_middleware do |chain|
    chain.add StatsdMiddleware
  end
end

This may be applicable to other threading/forking situations like unicorn and passenger as well.

I could package this up as an optional class or something, not sure if that's going too far. If nothing else, a wiki page might be good.

raggi commented 11 years ago

The problem is that, under load, garbage collection does not get to run

The above is your bug.

attached to threads are left behind

You have a bigger problem, because this means the threads are being left behind. The threads are more expensive than the sockets.

This may be applicable to other threading/forking situations like unicorn and passenger as well.

It should not be, unicorn is single threaded, and I don't know about passenger, but I'd hope it's using thread pools.

I haven't looked, but maybe this has something to do with celluloids use of fibers? (cc @tarcieri)

If it is celluloid related, I would recommend that you create an actor that handles stats submissions, and ship the stats through it. This way you'll only ever have a single statsd socket. You could also have the advantage of never blocking on stats submissions, with the only danger that you'll fill the actors mailbox instead (which could look like a leak). Example follows:

class Stactor
  include Celluloid

  def initialize statsd
    @statsd = statsd
  end

  def method_missing name, *args, &block
    @statsd.send(name, *args, &block)
  end

  # TODO respond to, etc
end

Now you can send them all asynchronously:

Stactor.new(statsd).async.increment "ohai"

One thing to watch out for however, is if you call async and then pass a block, you're not in the original stack context inside the block, and so you need to handle the concurrency problems yourself. Statsd itself won't protect you against that, although it should work internally. Batching should not be used across threads, typically if you're doing batching, keep that batch object around, and use it from a single thread (like, in an actor ;-) )

tarcieri commented 11 years ago

Is it possible this is related to thread locals? All released versions of Celluloid presently do not clean thread locals when threads are returned to Celluloid's thread pool. This is fixed on Celluloid master.

raykrueger commented 11 years ago

Well if it were not cleaning up thread locals I think that would be fine, as statsd would just reuse the existing socket and likely be fine. I know absolutely nothing about Celluloid right now though. Does it use a thread pool or does it just create and destroy threads as needed?

@raggi makes the point that really, it seems like I'm leaking threads that still have UDPSocket sockets connected.

I also totally forgot to mention that this is on ruby 1.9.2p290 on ubuntu 11.10.

tarcieri commented 11 years ago

Celluloid uses a thread pool, so if it fails to clean up the state of a thread properly before recycling them it can leak memory via idle threads in the pool

raykrueger commented 11 years ago

Well, to be specific, I was battling leaking file descriptors today. Once sidekiq got going (with 150 concurrency) you could just watch the open file descriptors pile up until the errors started.

I'm only beginning to read about Celluloid right now, but in it's self-healing ways, does it replace threads in the pool with new instances if there are exceptions?

tarcieri commented 11 years ago

Celluloid recycles the thread regardless of whether or not an exception occurred. It has a toplevel exception handler running inside each thread in the pool.

raggi commented 11 years ago

At concurrency 150, you'll end up using a lot of file descriptors. Depending on what else you're doing, I could easily see something like 750 (statsd, database, redis, https). That should still be inside the limits for a default linux setup, but ulimit on osx defaults to 256.

As far as thread recycling goes, even without being cleared out, this will not be a problem. You may end up with 150 udp bindings, but that's not the end of the world. Given that more recent versions of celluloid will clear out the locals, I would recommend binding to a single actor, as I discussed before, and giving a dedicated thread (if that's possible). This would also go some way toward preventing the mailbox overflow I described.

raggi commented 11 years ago

Also, on MRI, I'm not really sure you want to be running with such a high concurrency. It depends a lot what your specific jobs are, but I'd propose knocking that down to 15-50, and observe throughput. I'd bet throughput is higher in the lower concurrency region.

raykrueger commented 11 years ago

Tony and James, thanks for taking the time to help me sort this out, and for all the work you do. Just wanted to say that first :)

Our descriptor limit is 1024, but I could watch our sidekiq process just keep gaining and gaining until it hit the "Statsd: Errno::EMFILE Too many open files - socket(2) - udp" errors. Without Statsd in place we stayed inline with where we'd normally be with connections to mongo and redis.

I really like the statsd actor idea, I'm going to take a crack at that here shortly.

I'm trying to isolate what I was seeing in production a reproducible sample here. We'll see.

Thanks again!

raykrueger commented 11 years ago

@tarcieri You mentioned that Celluloid cleaning up thread locals. Is that in 0.12.4? I ask because, if Celluloid is clearing out thread locals then the UDPSocket that Statsd setup will not be closed when it is removed from the local storage. Which could cause the descriptor leaking I was seeing.

tarcieri commented 11 years ago

It's not in 0.12.4, only in master (0.13)

In the end you really need to find a way to ensure the UDP sockets are getting closed. Celluloid lets you define a finalizer which could be used to close sockets when actors are terminated, for example.

reinh commented 11 years ago

For an actor-based system like Celluloid, I think using an actor (or pool of actors) to manage Statsd as @raggi mentioned makes the most sense. The Statsd actor can consume stats messages off of its mailbox and then use batch delivery for efficiency and none of the other actors care about the concurrency or threadsafety or resource allocation/deallocation concerns. More encapsulation, safer concurrency, (probably) more performance.

raykrueger commented 11 years ago

Actually, looking at Sidekiq, I see that it handles Processor failure by linking a new one in...

https://github.com/mperham/sidekiq/blob/v2.6.4/lib/sidekiq/manager.rb#L75-L87

    trap_exit :processor_died
   ....
   def processor_died(processor, reason)
      watchdog("Manager#processor_died died") do
        @in_progress.delete(processor.object_id)
        @busy.delete(processor)

        unless stopped?
          @ready << Processor.new_link(current_actor)
          dispatch
        else
          signal(:shutdown) if @busy.empty?
        end
      end
    end

Sorry, still trying to track down why the threads and sockets aren't being reused as they should.

reinh commented 11 years ago

Sorry, still trying to track down why the threads and sockets aren't being reused as they should.

This seems as good a place to have the conversation as any. It's not like you're hurting anyone. :smile:

raggi commented 11 years ago

Yep, I only closed it for my own tracking purposes, I'll keep an eye on the email thread :-)

raykrueger commented 11 years ago

Thanks guys :)

@tarcieri Do you think the Sidekiq code above, given Celluloid 0.12.4, could cause the Thread local UDPSocket leaking I was seeing?

Thanks again, -Ray

tarcieri commented 11 years ago

@raykrueger yeah, thanks to Fibers even if you use @raggi's suggested approach it will make a new socket for each fiber :(

raykrueger commented 11 years ago

Aah :( ok thanks! So it seems my best bet is the Sidekiq middleware I mentioned then. I did actually get a Statsd actor working, but it was causing deadlocks when Unicorn initialized. That's a tale for another day though heh.

tarcieri commented 11 years ago

If you use @raggi's approach, and call the "exclusive" class method, you can at least eliminate the Fibers from ganking your thread locals.

raykrueger commented 11 years ago

Hey guys, I was doing other work for a while but now it's time to drag this out of the dust again :)

@tarcieri Given that sidekiq handles crashed actors by starting new ones is this a bug in sidekiq really? This is the part where I drag @mperham in :)

So it turns out, we've been leaking MongoDB connections for a long time as well. We're using mongodb-ruby 1.7.1, which uses a connection pool. When we're seeing workers fail, eventually we see mongo-ruby complaining that there are no available connections in the pool. Mongo-ruby also uses thread local storage to manage it's connection pool.

Again, we're on ruby 1.9.2 running sidekiq 2.6.4 and celluloid 0.12.4 on ubununtu 11.10 in AWS.

P.S. If your'e ever in Chicago I will buy you beer :)

tarcieri commented 11 years ago

@raykrueger Definitely not a Sidekiq bug. That's the intended use pattern for Celluloid.

If anything this is a Celluloid bug. I have some ideas about how to fix it but have not had time to implement them.

raykrueger commented 11 years ago

@tarcieri Tony, I was thinking about this again today. One thing that keeps coming to mind is the threadpool. If each thread were being correctly returned to the threadpool we wouldn't end up leaking the statsd udpsockets. When a worker crashes and a none is started it should, theoretically, be able to re-use any threadlocals in that thread (such as the udpsocket in question here, mongo connections, redis connections).

Is it possible that celluloid is killing the thread with the worker and starting a new thread for the new worker? I dug into this a while back but don't remember what I ended up finding heh.

tarcieri commented 11 years ago

There's a number of problems with the way Celluloid works now, most notably not providing a consistent view of thread locals across "Tasks" within the same actor.

Have a fix in mind for this, but it probably won't make it into Celluloid until 0.14 (release after the upcoming one)

elmer-garduno commented 11 years ago

We are having this same problem using jruby + trinidad. In Java, even when the threads are finished the UDP connections are left open, which is causing us the same file descriptor problems described above.

raykrueger commented 11 years ago

Add the middleware you see above and you'll be fine, and you'll have timings around all your jobs :P

tarcieri commented 11 years ago

Note that Celluloid 0.14 includes a fix for thread local behavior which should correct this problem under Celluloid at least. Prereleases are up on RubyGems, full release soon!

elmer-garduno commented 11 years ago

Thanks guys, just one more clarification. Why not use a connection pool as suggested earlier to manage the UDP connections?

At least on Tomcat/Trinidad there seems to be some issues on keeping resources on the ThreadLocal.

http://tomcat.10.x6.nabble.com/Executor-thread-lifecycle-td1995175.html

Specifically:

> I'm using ThreadLocal to allocate some resources per thread... 

That is asking for a memory leak. There is a simple rule for correctly 
using a ThreadLocal in a web application. 

The ThreadLocal must be removed from the thread before the 
request/response in which it was created completes processing.

Any thoughts on this?

raykrueger commented 11 years ago

ThreadLocals for the UDPSocket are fine if there's a properly managed thread pool in use. The existing UDPSocket can be re-used safely. The UDPSockets aren't really leaked, the threads are, and since the sockets are attached as locals they go out with the bathwater.

I'm not clear on this but it seems Celluloid doesn't back the actors with a threadpool. I think a new thread is started for each actor. So when a sidekiq worker (actor) dies, a new one is started and the thread is leaked (along with the locals).Tony might say all this is completely wrong though :)

tarcieri commented 11 years ago

Celluloid does use a thread pool, however the thread locals are cleared when threads are returned to the pool.

raykrueger commented 11 years ago

Ah ha! I don't know how I didn't pick up on that in all the previous conversation :)

elmer-garduno commented 11 years ago

Is there any reason for not creating a UDPSocket.new each time it is required, instead of holding a reference to it on the Thread?

raykrueger commented 11 years ago

That is the effective result of my middleware actually.

elmer-garduno commented 11 years ago

OK, now it makes sense. Thanks!

reinh commented 11 years ago

@elmer-garduno Good question. I'd rather do away with thread locals (because they're so often not) if possible. @raggi, thoughts?

JustinAiken commented 10 years ago

Running into these issues now.. we're using Celluloid (0.15.2) and jRuby with the latest gem of statsd-ruby.. so many sockets are created that it ends up hitting limits and crashing!

After some experimentation and testing, something like @raykrueger's change to use a single socket, along with an exclusive celluloid actor, seems to be working...

https://github.com/reinh/statsd/issues/42 and https://github.com/reinh/statsd/issues/31 have both been closed (with good reason)... Would a PR that allows for configurable options for each/either of those methods, with default behavior as-is, be a good thing?

JustinAiken commented 10 years ago

Also, @tarcieri, could you help shed some light on the socket creation based on thread variables for me? We're on a current celluloid, after the https://github.com/celluloid/celluloid/commit/395a6dff992a16609baa94331b50c4848678967d commit, but whenever it gets here, it creates a new socket anytime.

Here's our actor we send the stats through:

  class Stactor
    include Celluloid

    exclusive

    attr_accessor :statsd

    def initialize(statsd)
      @statsd = statsd
    end

    def send_stat(meth, *args, &blk)
      statsd.send meth, *args, &blk
    end

    def method_missing(meth, *args, &blk)
      self.async.send_stat meth, *args, &blk
    end
  end

tarcieri commented 10 years ago

Ruby has wacky thread local semantics. The behavior of Thread.current#[] should probably have been Fiber.current#[] instead, but alas, to get "real thread locals" @tenderlove had to add Thread.current.thread_variable_get and Thread.current.thread_variable_set, which is probably what you want here, but unavailable before Ruby 2.0.

Also Ruby doesn't support destructors for thread locals, so it's easy for stuff like this to leak.

reinh / statsd

UDPSockets leaked under load #34