sensu-plugins / sensu-plugin

A framework for writing Sensu plugins & handlers with Ruby.
http://sensuapp.org
MIT License
126 stars 117 forks source link

influxdb metric format generates multiple measurements #197

Open cwjohnston opened 5 years ago

cwjohnston commented 5 years ago

Expected Behavior

Metrics collected by a single execution of a check plugin using --metric_format influxdb should be rolled up into a single Influx measurement.

Current Behavior

Metrics collected by a single execution generate multiple Influx measurements, recorded with the same time but varying key fields.

Context

Recently I've been working with the metrics-aggregate.rb check plugin from the sensu-plugins-sensu collection. In doing so, I've learned a bit about how this library is generating measurements for influxdb line protocol -- and I think we need to make a change.

After installing the plugin collection, I ran the following to collect measurements of check results being reported into aggregates. In this example I have four checks feeding results into one aggregate named "procs" :

/opt/sensu/embedded/bin/metrics-aggregate.rb --metric_format influxdb
sensu.aggregates,aggregate=procs clients=1 1545929125
sensu.aggregates,aggregate=procs checks=4 1545929125
sensu.aggregates,aggregate=procs ok=4 1545929125
sensu.aggregates,aggregate=procs warning=0 1545929125
sensu.aggregates,aggregate=procs critical=0 1545929125
sensu.aggregates,aggregate=procs unknown=0 1545929125
sensu.aggregates,aggregate=procs total=4 1545929125
sensu.aggregates,aggregate=procs stale=0 1545929125

After traversing my Sensu pipeline, this output is accepted by InfluxDB and recorded. I can query them using influx CLI:

> SELECT * FROM "sensu.aggregates"
name: sensu.aggregates
time                aggregate checks clients critical critical_1 host                 ok ok_1 stale stale_1 total total_1 unknown unknown_1 value warning warning_1
----                --------- ------ ------- -------- ---------- ----                 -- ---- ----- ------- ----- ------- ------- --------- ----- ------- ---------
1545929125000000000 procs     4                                  sensu-enterprise-poc                                                       0             
1545929125000000000 procs                                        sensu-enterprise-poc                                                       0             0
1545929125000000000 procs                             0          sensu-enterprise-poc                                                       0             
1545929125000000000 procs                                        sensu-enterprise-poc                                             0         0             
1545929125000000000 procs                                        sensu-enterprise-poc    4                                                  0             
1545929125000000000 procs            1                           sensu-enterprise-poc                                                       0             
1545929125000000000 procs                                        sensu-enterprise-poc               0                                       0             
1545929125000000000 procs                                        sensu-enterprise-poc                             4                         0             
1545929125000000000 procs                                        sensu-enterprise-poc               0                                       0             
1545929125000000000 procs                                        sensu-enterprise-poc                             4                         0             
1545929125000000000 procs                             0          sensu-enterprise-poc                                                       0             
1545929125000000000 procs     4                                  sensu-enterprise-poc                                                       0             
1545929125000000000 procs                                        sensu-enterprise-poc                                                       0             0
1545929125000000000 procs            1                           sensu-enterprise-poc                                                       0             
1545929125000000000 procs                                        sensu-enterprise-poc                                             0         0             
1545929125000000000 procs                                        sensu-enterprise-poc    4                                                  0             

You can see that although the time is the same for each of these measurements, each one has a single value for one dimension of the aggregate being measured. That is to say, one line has a value of 4 for the ok key field, and another may have a value of 1 for the clients field, but none of the measurements have values for multiple key fields.

On the face of it this looks less than ideal, but in reality I think this makes the measurements rather useless. The effect becomes obvious when one attempts to perform basic math across these multiple measurements.

In my case I want to use these measurements to provide a value to a single stat pane in a Grafana dashboard. In theory, this should allow me to use an Influx query like this one to return a percentile of ok checks:

SELECT (ok / total) * 100 AS "calculated_percentage" FROM "sensu.aggregates" WHERE "aggregate" = 'procs'

But this query returns an empty response. I believe this is because of a known limitation in InfluxDB which prevents mathematics across measurements.

If I manually insert a measurement which has values for both ok and total key fields, the query works as expected:

> insert sensu.aggregates,aggregate=procs ok=3,warning=1,critical=0,unknown=0,stale=0,total=4 1545144411148540258
> SELECT (ok / total) * 100 AS "calculated_percentage" FROM "sensu.aggregates" WHERE "aggregate" = 'procs'
name: sensu.aggregates
time                calculated_percentage
----                ---------------------
1545144411148540258 75

Because of the limitation the current approach creates, I've had to send these measurements to graphite instead, where I was able to use asPercent function across the recorded measurements to get the needed single stat.

Environment

portertech commented 5 years ago

Rough ideas:

  1. CLI flag to aggregate metrics under a single influxdb measurement
  2. CLI flag to "intelligently" group metrics into multiple influxdb measurements
portertech commented 5 years ago
  1. separate plugin method to create and output a single influxdb measurement (thanks @cwjohnston)