tim-group / java-statsd-client

a java statsd client library
Other
275 stars 139 forks source link

Add support for sets #5

Closed danieltwagner closed 10 years ago

danieltwagner commented 11 years ago

StatsD supports sets and reports the size of these unique sets to graphite. (https://github.com/etsy/statsd/pull/152) This patch adds client-side support for adding elements to named StatsD sets.

scarytom commented 11 years ago

Thanks for this. Looks good to merge, which I'll do when I'm in a position to push a new release to maven central.

danieltwagner commented 11 years ago

Great! Please do let me know when that happens so that I can switch back to your maven artifact :)

scarytom commented 11 years ago

I've had another look this morning, and I do think there are a couple of issues with your changes.

Firstly, the documentation suggests that sets hold numerical values like this:

uniques:765|s

So, I'd expect the API to take ints rather than Strings.

Further, I'm curious as to why you've baked the new support for multi-metric packets into the API by using a varargs signature. For now, I'd like to be consistent and just support single-metric packets, then enhance the entire API to support multi-metric packets in a consistent way. There is the problem of exceeding the MTU size to consider when doing this.

In summary, for now I'd like your signature of:

void addSetElements(String aspect, String... elements);

to be more like this

void recordSetValue(String aspect, int value);
danieltwagner commented 11 years ago

I agree that the multi-metric packets are not quite in line with the rest. I decided to add it because particularly for sets you are likely to add many elements in one go. Good point about MTU size, and you are probably right that it might be worth implementing multi-metric in a consistent way.

Regarding ints rather than Strings, the documentation does suggest that, however, it is not the case. I've discovered that after looking through the sets patch on stastd. I'm using the code in production and it works like you would imagine, with non-numeric values.

If you prefer I can change the signature to be just a single String. It would be interesting to have a discussion on how to do aggregation for multi-metric packets in general. It seems like you would need to do automatic aggregation over a (configurable) window of maybe a second or two by default, sending packets earlier when they would hit the MTU limit.

scarytom commented 11 years ago

I think you are confusing undocumented (incidental) behaviour and documented (supported) behaviour. The documentation clearly states:

StatsD supports counting unique occurences [sic] of events between flushes, using a Set to store all occuring [sic] events.

I am not denying that you have successfully analysed the statsd code-base and uncovered a convenient loosening of the documented API, nor that you have subsequently built production code that relies on that quirk. I am, however, uncomfortable about encouraging this use, as it maybe silently (and legitimately) culled in a future statsd release.

statsd describe their product as:

A network daemon that runs on the Node.js platform and listens for statistics, like counters and timers, sent over UDP and sends aggregates to one or more pluggable backend services (e.g., Graphite).

Fundamentally, statsd was created to aggregate numerical event streams before publishing to metrics visualisation tools like Graphite. As such it makes no sense to pass it string values, which cannot be interpreted by Graphite. To this end, I think that this Java API should encourage correct usage through appropriately typed method signatures, in this case: void recordSetValue(String aspect, int value);

I agree that it would be interesting to have a discussion on how to do aggregation for multi-metric packets in general, as there are a number of potential approaches. I might favour giving explicit flow-control to the user by providing an API with a flush method, as well as offering an auto-flush version operating as you suggest.

danieltwagner commented 11 years ago

After typing a response to the contrary, I re-read the original pull request for statsd and now see what you mean. The patch was intended to

[...] graph the number of active logged in users on the website. Maintaining that state across application servers to manually update gauges is non-trivial. We send a message to statsd containing the id of the user making a request. statsd store all unique values between flushes, and sends the number of elements in the set to graphite.

Indeed, a user ID could be seen as being purely numeric and given the example and general use of statsd, it seems that that was in fact the intended use, and I found what you call a "convenient loosening of the documented API". In the light of this, I should probably drop the current implementation and add a simple one-liner that does just what you were asking for. A second patch could then deal with aggregation.

scottbessler commented 11 years ago

From reading the documentation, and looking at the code, it would seem to be incorrect to assume that etsy/statsd meant to require a user "id" to be numeric. I think Daniel has this right.

For further proof, check out the etsy/statsd python example, which clearly shows string values: https://github.com/etsy/statsd/blob/master/examples/python_example.py

danieltwagner commented 11 years ago

I know this has been dormant for a while but I did modify my pull request to include only single elements and renamed the method to recordSetValue. As discussed previously, the ability to have elements in sets that are not numbers is crucial for me and as Scott found also part of the python code as well as the documentation. I believe this should now be able to be merged in. Do you agree? :)

danieltwagner commented 10 years ago

Any update on this?

JensRantil commented 10 years ago

Any update on this?

scarytom commented 10 years ago

Firstly, sorry for letting this lie dormant for so long.

I've had another read through the statsd source code and the documentation, and I'm in agreement with @scottbessler that you were right @danieltwagner in your original interpretation that string values for sets are supported.

I'm merging this pull request now, and will aim to release a new version of the java-statsd-client later today, or early next week.