reinh / statsd

A Ruby Statsd client that isn't a direct port of the Python example code. Because Ruby isn't Python.
MIT License
411 stars 154 forks source link

Add support for batched messages #19

Closed reinh closed 12 years ago

reinh commented 12 years ago

Statsd#batch_size attribute sets the maximum number of messages to batch before writing them to the UDP socket. The default setting of 1 maintains existing statsd behavior of sending all messages immediately

This introduces batching logic into the main control flow and may have performance and/or thread-safety implications, but it is the simplest thing that could possibly work and is hopefully a suitable proof of concept.

reinh commented 12 years ago

This implements an always-on batching strategy while I think @mattetti was originally suggesting something more like:

$statsd.batch do |batch|
  do_some_stuff
  batch.increment 'did some stuff'
end

Thoughts?

reinh commented 12 years ago

The above would be pretty simple to implement as an around method pattern:

def batch
  start_batching
  yield self
ensure
  stop_batching
end

stop_batching would properly flush, etc.

mattetti commented 12 years ago

I was suggesting the block or start/end approach so you could logically control the dispatches depending on what/how you instrument. But I quite like the idea of not having to worry about a block or a start/end of flushing and just optimizing at a higher abstraction level as you did in your spike.

(I'm also not a big fan of blocks in this context due to potential scoping issues, but that's a detail and you just down above )

reinh commented 12 years ago

One issue with always-on is that if statsd is not being written to often then the batch can hold onto messages for an unlimited amount of time if it doesn't grow large enough to trigger the flush. I suppose the idea is that if you need to batch messages then you're already in a high-throughput situation so this is probably not an issue in practice.

mperham commented 12 years ago

I'm just starting to use Statsd in Sidekiq and while I love the idea of batching, I really need it to be thread-safe. In our application, our Statsd instance is a global since the current impl is thread-safe.

reinh commented 12 years ago

@mperham It looks like the current implementation cannot guarantee that exactly batch_size messages are batched since the batch is not a thread-safe data structure. The current behavior would be that usually batch_size messages are sent but sometimes more are sent. Never, of course, less.

To be clear, I'm not going to merge this until I'm happy with both its performance and thread-safety. It's still very much a spike.

Two improvements would be to use a thread local batch and to have send call send_to_socket directly when batch_size is 1, bypassing the batching altogether. Thoughts?

fcheung commented 12 years ago

The current code could result in batches getting sent twice (if a second thread calls flush! before the first thread clears the batch) and i believe concurrent modification of Array on jruby is undefined in its consequences.

reinh commented 12 years ago

@fcheung Yes, very good catch! I'm loathe to introduce a mutex here as that's basically guaranteed to be not-performant. Personally, I handle thread-safety by using a thread-local statsd client (they're very lightweight objects), so I don't run into this very much in practice. Thoughts?

reinh commented 12 years ago

@raggi wrote some nice real world code for this. Let's move this to #22.