influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.68k stars 5.59k forks source link

Internal plugin additional fields #8607

Open adam0war opened 3 years ago

adam0war commented 3 years ago

Proposal:

Add metrics_dropped and metrics_filtered fields to input plugins.
Add metrics_dropped, metrics_filtered, metrics_gathered(processed/aggregated) to processors and aggregators plugins.

Current behavior:

metrics_dropped and metrics_filtered are only available for output plugins. For processors and aggregators plugins the only available field is internal_process_errors

Desired behavior:

Being able to see how many metrics has been filtered/dropped out on the input level. Being able to see how many metrics has been gathered(processed/aggregated)/filtered/dropped on the processor/aggregator level.

Use case:

Improving telegraf self monitoring, additional performance measurement, could be very useful with exec input plugin or regexp processor plugin

ssoroka commented 3 years ago

Sounds reasonable. I think the source of this is that the output is really the only place that metrics were "dropped", which was due to slow outputs. FYI processor filters aren't used to drop metrics (they aren't removed from the pipeline), they just travel to the next processor/aggregator/output, so I don't think metrics_filtered makes any sense for processors/aggregators. metrics aren't dropped on inputs, so that doesn't really make sense..

what we're left with is:

processors historically haven't generated metrics, though that's possible now with the execd and starlark plugins. Not sure if this would be interesting.

adam0war commented 3 years ago

Thank for clarifying. In this case adding metrics_gathered and metrics_filtered across all plugins would do the job. It would allow getting gathered-filtered ratio for each plugin, e.g how many metrics has been gathered(processed?) or filtered out by particular processor plugin, it would also cover intentionally removed metrics case.

Could you please explain difference between internal_write_metrics_dropped and internal_write_errors?

sjwang90 commented 2 years ago

Adding information about configurations: https://github.com/influxdata/telegraf/issues/8730#issuecomment-977961493

LastConfigUpdate: date that will be saved as a string in influxDB (since it has no "date" datatype), not sure how handy is to work with a string, to build alert rules from Grafana/Kapacitor IsUpdated: boolean that shows if the config endpoint was reachable or not Message: text field with the error itself, whatever it is (timeout, not reachable, unauthorized)