sivy / pystatsd

Python implementation of the Statsd client/server
http://pypi.python.org/pypi/pystatsd/
BSD 2-Clause "Simplified" License
358 stars 87 forks source link

Set type is implemented incorrectly (it actually behaves like a counter) #89

Closed banks closed 10 years ago

banks commented 10 years ago

I've been working at an organisation running pystatsd for a while. No one could explain to me we set() was the right way to implement absolute value counters (as opposed to rate counter) and every time I've read the original statsd docks I've been baffled by our implementation is so contradictory to them.

I even read the nodejs source and couldn't understand the behaviour I was actually seeing. I've just realised we //actually// use pystatsd even though it's always just referred to as 'statsd' and this implementation is totally different to the original on a couple of important counts (pun intended).

  1. Original definition of set is that it counts 'unique' things. That is, if you do:
statsd.set('uniques', 1234);
statsd.set('uniques', 1234);
statsd.set('uniques', 5678);

and nothing else in one time window, you should see 2 as the value recorded to graphite/backend. See https://github.com/etsy/statsd/blob/master/backends/graphite.js#L147 in original implementation - it takes set.values().length as the count which is https://github.com/etsy/statsd/blob/master/lib/set.js#L23-L29.

values: function() {
    var values = [];
    for (var value in this.store) {
      values.push(value);
    }
    return values;
  }

In other words for each //key// in the object, push it to values array, then the total number of these (distinct) keys is counted.

  1. Often you actually want an absolute count of something not just an averaged rate which statsd counters give. Your set implementation actually does that (incorrectly). Original statsd provides that ability by enabling option to flush_counts on counters: https://github.com/etsy/statsd/blob/master/backends/graphite.js#L111-L114

Overall, we end up using set('foo', 1) for all out "absolute" counters, but today I finally worked out why that works for us despite every bit of statsd related documentation describing sets totally differently.

Honestly I'm not even sure what to request at this point - we have tons of code now relying on this broken/inconsistent behaviour, and have never had any real need for the original version of set so changing it would actually be pretty bad for us.

But perhaps you could put something prominent in the README that explains this to the next poor soul who tries to make sense of it all...

Thanks for great project none the less!

banks commented 10 years ago

Apologies... The version that exhibits the above bugs is infact an internal fork of this project which added the broken set behaviour. So this is... invalid. Carry on!