Open lahsivjar opened 3 years ago
https://github.com/lahsivjar/telegraf/commit/43ea032d6e29906e393557b08b8bfbac988c8c0b
A PoC for the proposal.
@danielnelson Would be great if you can take a look at it and give some feedback
There are other options to achieve this for some other input plugins by using aggregators. However, for statsd input the plugin itself does the parsing and aggregation. Because of this, the raw data is lost.
One way to fix this would be to overhaul the statsd input plugin and create a parser for statsd which would generate telegrah metric using the raw statsd data. For aggregations, the end-user can define aggregators to produce the same cumulative effect as the current statsd input plugin.
@danielnelson WDYT?
@lahsivjar Would you be willing to turn your POC into an actual PR (a draft one if you think it isn't ready to be merged as-is)? I think with an actual PR it'd be much more likely to get feedback on it.
Personally, I'd love to see this, whether the we get the histogram-support baked into the statsd
plugin alongside all of it's existing built-in aggregation, or if we simply add a statsd
parser that can be used with socket_listener
to get raw data to pass to ad-hoc aggregators manually, or both.
@philomory Thanks for the ping, I have been away from this project for quite some time. I have created a PR from the PoC commit to initiate conversation, I hope the approach is not completely outdated by now 🤞
I'm thinking about this feature myself because it makes me wonder how to get accurate percentiles if we were to scale out telegraf horizontally.
E.g. we could run multiple telegraf replicas and each one maintains some p90 of e.g. a latency measurement from statsd. But a p90(of the p90's across n-replicas)
is kind of a meaningless value.
But if each replica exposes histogram buckets, then you can do statistically meaningful percentiles - specifically in the case of input: statsd, output: prometheus
flows.
I'd also love this feature. Without this feature as far as I know it's not possible to "aggregate" percentile values from multiple series.
Meaning that, the unique combination of metric field values will produce multiple separate series in Prometheus. Right now you can get a percentile value for each metric but you can't combine them together to get an aggregate value (using the existing percentiles
feature of Statsd input).
Example:
api_response_time_ms{endpoint="list_books", server="server1"}
api_response_time_ms{endpoint="add_book", server="server2"}
You'd need the "bucket" data to be able to use histogram_quantile()
to get an overall percentile value for api_response_time_ms
Feature Request
A way to generate Prometheus style histogram metrics from statsd input plugin (something similar to https://github.com/atlassian/gostatsd#timer-histograms-experimental-feature)
Proposal:
Statsd Input plugin receives raw data from clients thus it should be possible to maintain counters for a user-defined set of le-buckets(similar to Prometheus). Le labels can be added as tags.
Current behavior:
It is not possible to generate Prometheus style histogram metrics for statsd Input plugin.
Desired behavior:
Generate Prometheus style histogram metrics for statsd Input plugin
Use case:
This will add flexibility for conversion from statsd to Prometheus type data.