reinh / statsd

A Ruby Statsd client that isn't a direct port of the Python example code. Because Ruby isn't Python.
MIT License
411 stars 154 forks source link

Stats with sample rates < 1 are being sent randomly? #52

Closed Ajedi32 closed 10 years ago

Ajedi32 commented 10 years ago

So reading over the source code, I found this line:

def send_stats(stat, delta, type, sample_rate=1)
    if sample_rate == 1 or rand < sample_rate # <-- Wait, what?
      # Replace Ruby module scoping with '.' and reserved chars (: | @) with underscores.
      stat = stat.to_s.gsub('::', '.').tr(':|@', '_')
      rate = "|@#{sample_rate}" unless sample_rate == 1
      send_to_socket "#{prefix}#{stat}#{postfix}:#{delta}|#{type}#{rate}"
    end
  end

I realize the idea is to only send stats sample_rate*100% of the time, but this implementation seems really strange to me.

First of all, using rand instead of some kind of cycling counter seems like a bad idea, as variances in the random number generator could result in long successive runs of either stats not being sent, or always being sent.

Secondly, my personal expectation about the sample rate argument was that it was simply a way of telling the statsd server what the sample rate was, not a way of telling the statsd gem how often to actually send the metric that I just told it to send. That seems like very strange behavior to me.

raggi commented 10 years ago

Using a counter and deterministic sampling method would not be statistically sound.