Closed crflanigan closed 1 year ago
What output plugin are you using? Are you also using the prometheus serializer?
The prometheus serializers recently got the ability for the user to specify what fields should be counters or gauges.
Hi @powersj,
In this case we're using outputs.stackdriver
to send the metrics.
Something that we found recently was that most of the Windows PerfMon input plugins are showing that they are untyped
. I think in the past you mentioned that all metrics from a Telegraf perspective are untyped
. We didn't notice this though because we were sending everything from the agent to a Telegraf server proxying the metrics to Google, using the opentelemetry
plugins as a go-between. When you send using those plugins they type
the metrics, but if you send directly using outputs.stackdriver
directly to Google the metrics are untyped
, thus being sent both as counter
and untyped
, per Google's request. This is a problem because this means that some metrics will be in a different metrics path than others, as well as double ingesting the untyped
metrics.
We discussed using a data_format
serializer in the past, but found that it wasn't feasable with the current outputs.stackdriver
design, and modifying the behavior in the plugin wasn't desirable because we don't want to impact legacy users of the plugin either. At the time I was just using the regex processor plugin to normalize metrics, but it doesn't account for type. I was thinking it would be cool to have a processor
which attaches a type
to all metrics before it's flushed to the output plugin. I'm also wondering if I could use starlark
to do this, though I've never played with it before.
What do you think?
In this case we're using outputs.stackdriver to send the metrics.
I thought it would be this one, but I didn't want to assume in case you are using something else now ;)
I was thinking it would be cool to have a processor which attaches a type to all metrics before it's flushed to the output plugin.
Let me talk to Sven about this since we just had a discussion about this when we added https://github.com/influxdata/telegraf/pull/13874
The thought was that the only time someone is producing Prometheus style metrics is really when they use the prometheus outputs which also in turn use the Prometheus serializer. The Stackdriver out, being the special plugin that it is, breaks that assumption with how you are now using it.
Got it!
Thanks buddy!
After discussing, we are considering that this would be a new option to the stackdriver output that mirrors what we did in #13874:
## This overrides the metric-type of the Telegraf metric. Globbing is allowed.
# [[outputs.stackdriver]]
# metric_counter = []
# metric_gauge = []
We only support gauge and counters today in the plugin so we would only add these two for now as well. The user would provide a metric name, with globs allowed, and metrics matching those would have their type set.
When sending via the metric name format of "official", we also send a counter today. By adding the above options it would essentially let you avoid the unknown metric, and allow you to specify gauges.
Thoughts?
I think that sounds great!
@crflanigan I have created #14017 as a draft with the new config options.
These config options take metric names, not field names, and can take wild cards as well. Could you give it a try and let me know if it resolves the cases where you have unknown metrics?
Thanks!
Testing now!
This looks good!
One thing we noticed though.
When we send metrics to Google now from Stackdriver all of our metrics end in _gauge
, which we have had some complaints about, but figured it was just the way it was.
This new way you're sending doesn't include the _gauge
which is desirable.
Example:
win_cpu_Percent_Processor_Time
vs
win_cpu_Percent_Processor_Time_gauge
It looks like this code is responsible for this:
func (s *Stackdriver) generateMetricName(m telegraf.Metric, key string) string {
// ...
var kind string
switch m.Type() {
case telegraf.Gauge:
kind = "gauge"
case telegraf.Untyped:
kind = "unknown"
case telegraf.Counter:
kind = "counter"
case telegraf.Histogram:
kind = "histogram"
default:
kind = ""
}
return path.Join(s.MetricTypePrefix, name, kind)
}
I realize this would be a separate issue, but would that be easy to change to drop the _gauge
?
Anyway, I think this looks great, and after it's merged I could probably add this to the Telegraf server configuration so that those metrics match what's being sent here.
There are two naming schema options, I am assuming you are using "official" given we added that based on your issue in #13303. For the official naming I would expect something formed using the following format:
metricTypePrefix/namespace_metricname_fieldname/type
For example in my own test cases using the official naming method produces:
custom.googleapis.com/test_mem_g_value/gauge
I realize this would be a separate issue, but would that be easy to change to drop the _gauge?
Hmm the code you linked to is not appending with an underscore. It is joining values with paths, meaning a forward slash like the items above.
Hmm the code you linked to is not appending with an underscore. It is joining values with paths, meaning a forward slash like the items above.
OK, just a guess. I wonder if it's when we pass the metrics through OpenTelemetry plugin inputs/outputs that's causing it.
Either way, this is working as far as I can tell, though I needed to add the regex processor again to fix an issue with Windows PerfMon input plugin metrics having a /
and a .
Example:
[[inputs.win_perf_counters]]
alias = "pc_store_win_physicaldiskmetrics"
interval = "1m"
[[inputs.win_perf_counters.object]]
ObjectName = "PhysicalDisk"
Instances = ["_Total"]
Counters = ["Avg. Disk sec/Read", "Avg. Disk sec/Transfer", "Avg. Disk sec/Write"]
I guess something for people to keep in mind.
Thanks buddy!
I wonder if it's when we pass the metrics through OpenTelemetry plugin inputs/outputs that's causing it.
Ah yes, I forget you have this in your setup. I do believe the otel2influx library will use the types, like gauge, etc. as the field name, which is then how you get it in your metrics. See metrics notes as well.
Either way, this is working as far as I can tell,
Thanks for testing it out!
You bet!
Thanks for the quick fix!
Use Case
For metrics that are being sent to a Prometheus sink, metrics should be typed similar to how opentelemetry.go handles it here.
It would be great if there was a processor plugin which could be optionally used to type the metrics.
Expected behavior
Metrics receive a type.
Actual behavior
This doesn't exist as far as I'm aware.
Additional info
It appears that Windows PerfMon metrics are untyped, while other metrics like
inputs.cpu
appear to be typed. In testing we found that when we send metrics through a Telegraf server via theopentelemetry
inputs/outputs plugins it types them correctly and we're hoping for a processor plugin which can emulate this behavior when we aren't sending to a Telegraf server or intermediate proxy.