mdlayher / apcupsd_exporter

Prometheus exporter that exposes metrics from apcupsd's NIS. MIT Licensed.
MIT License
153 stars 65 forks source link

Add apcupsd_ups_status and apcupsd_ups_info metrics #8

Closed jangrewe closed 4 years ago

jangrewe commented 5 years ago

This adds multiple apcupsd_ups_status metrics, each containing the label status with all possible values the STATUS: line can contain. (Label values taken from https://www.systutorials.com/docs/linux/man/8-apcaccess/)

This also add the apcupsd_ups_info metric and moves all the redundant info labels from the other metrics to this one (as suggested by you in #6).

And lastly this PR adds an example Grafana dashboard and the respective screenhot.

Added bonus: a Dockerfile (image size: 8MB), with the image available at https://hub.docker.com/r/jangrewe/apcupsd-exporter

jangrewe commented 5 years ago

@mdlayher Any chance you could have a look at this PR? I'm sorry that it got a bit out of hand, but i think it's worth it! 😄

mdlayher commented 5 years ago

Thank you for the PR and I appreciate the effort you've put in, but this PR has a lot of unrelated changes that make it difficult to review quickly.

My free time is rather limited this week. I'll try to get to this early next week.

candlerb commented 5 years ago

Just a note to say that I have tested from the @jangrewe repo, and it works for me:

apcupsd_apcupsd_nominal_power_watts 390
apcupsd_battery_charge_percent 100
apcupsd_battery_cumulative_time_on_seconds_total 0
apcupsd_battery_nominal_volts 12
apcupsd_battery_number_transfers_total 0
apcupsd_battery_time_left_seconds 4602
apcupsd_battery_time_on_seconds 0
apcupsd_battery_volts 13.4
apcupsd_line_nominal_volts 230
apcupsd_line_volts 240
apcupsd_ups_info{hostname="nuc2",model="Back-UPS XS 700U",ups_name="bx700"} 1
apcupsd_ups_load_percent 3
apcupsd_ups_status{status="BOOST"} 0
apcupsd_ups_status{status="CAL"} 0
apcupsd_ups_status{status="COMMLOST"} 0
apcupsd_ups_status{status="LOWBATT"} 0
apcupsd_ups_status{status="NOBATT"} 0
apcupsd_ups_status{status="ONBATT"} 0
apcupsd_ups_status{status="ONLINE"} 1
apcupsd_ups_status{status="OVERLOAD"} 0
apcupsd_ups_status{status="REPLACEBATT"} 0
apcupsd_ups_status{status="SHUTTING DOWN"} 0
apcupsd_ups_status{status="SLAVE"} 0
apcupsd_ups_status{status="SLAVEDOWN"} 0
apcupsd_ups_status{status="TRIM"} 0

I am ambivalent, however, about the loss of metadata from the individual metrics.

Including metadata in every metric is how the official snmp_exporter works. Example:

# HELP ifHCInOctets The total number of octets received on the interface, including framing characters - 1.3.6.1.2.1.31.1.1.1.
6
# TYPE ifHCInOctets counter
ifHCInOctets{ifAlias="",ifDescr="bridge",ifIndex="14",ifName="bridge"} 6.343343949e+09
ifHCInOctets{ifAlias="",ifDescr="ether1",ifIndex="16",ifName="ether1"} 2.4808641625e+10
ifHCInOctets{ifAlias="",ifDescr="ether2",ifIndex="2",ifName="ether2"} 0
ifHCInOctets{ifAlias="",ifDescr="ether3",ifIndex="3",ifName="ether3"} 6.083273145e+09
ifHCInOctets{ifAlias="",ifDescr="ether4",ifIndex="4",ifName="ether4"} 6.79593826e+08
ifHCInOctets{ifAlias="",ifDescr="ether5",ifIndex="5",ifName="ether5"} 2.91243382e+08
ifHCInOctets{ifAlias="",ifDescr="pppoe-out1",ifIndex="17",ifName="pppoe-out1"} 2.4255138238e+10
...
# HELP ifHCOutOctets The total number of octets transmitted out of the interface, including framing characters - 1.3.6.1.2.1.31.1.1.1.10
# TYPE ifHCOutOctets counter
ifHCOutOctets{ifAlias="",ifDescr="bridge",ifIndex="14",ifName="bridge"} 2.5131203193e+10
ifHCOutOctets{ifAlias="",ifDescr="ether1",ifIndex="16",ifName="ether1"} 5.096503509e+09
ifHCOutOctets{ifAlias="",ifDescr="ether2",ifIndex="2",ifName="ether2"} 0
ifHCOutOctets{ifAlias="",ifDescr="ether3",ifIndex="3",ifName="ether3"} 1.0343428154e+10
ifHCOutOctets{ifAlias="",ifDescr="ether4",ifIndex="4",ifName="ether4"} 1.4271999034e+10
ifHCOutOctets{ifAlias="",ifDescr="ether5",ifIndex="5",ifName="ether5"} 4.11163912e+09
ifHCOutOctets{ifAlias="",ifDescr="pppoe-out1",ifIndex="17",ifName="pppoe-out1"} 4.764235347e+09

This in turn makes it very easy to use in grafana dashboards, with all of the graph lines labelled with the available metadata, and easy to filter on any particular piece of metadata.

If they changed it to work the way proposed here, then it would look more like this:

ifIndex{ifAlias="",ifDescr="bridge",ifIndex="14",ifName="bridge"} 1
ifIndex{ifAlias="",ifDescr="ether1",ifIndex="16",ifName="ether1"} 1
ifIndex{ifAlias="",ifDescr="ether2",ifIndex="2",ifName="ether2"} 1
ifIndex{ifAlias="",ifDescr="ether3",ifIndex="3",ifName="ether3"} 1
ifIndex{ifAlias="",ifDescr="ether4",ifIndex="4",ifName="ether4"} 1
ifIndex{ifAlias="",ifDescr="ether5",ifIndex="5",ifName="ether5"} 1
ifIndex{ifAlias="",ifDescr="pppoe-out1",ifIndex="17",ifName="pppoe-out1"} 1
...
ifHCInOctets{ifIndex="14"} 6.343343949e+09
ifHCInOctets{ifIndex="16"} 2.4808641625e+10
ifHCInOctets{ifIndex="2"} 0
ifHCInOctets{ifIndex="3"} 6.083273145e+09
ifHCInOctets{ifIndex="4"} 6.79593826e+08
ifHCInOctets{ifIndex="5"} 2.91243382e+08
ifHCInOctets{ifIndex="17"} 2.4255138238e+10
...
ifHCOutOctets{ifIndex="14"} 2.5131203193e+10
ifHCOutOctets{ifIndex="16"} 5.096503509e+09
ifHCOutOctets{ifIndex="2"} 0
ifHCOutOctets{ifIndex="3"} 1.0343428154e+10
ifHCOutOctets{ifIndex="4"} 1.4271999034e+10
ifHCOutOctets{ifIndex="5"} 4.11163912e+09
ifHCOutOctets{ifIndex="17",ifName="pppoe-out1"} 4.764235347e+09

This has some advantages with regards to continuity of timeseries if metadata changes, but seems to be a lot harder to work with in grafana.

Let's consider the simpler case of apcupsd with this proposed patch applied. Suppose you have multiple UPSes, and are showing all the "time remaining" in a single graph with a separate line for each UPS. Once prometheus has added its own scraping labels, you would get metrics like this:

apcupsd_ups_info{job="apcupsd",instance="192.0.2.1",hostname="nuc1",model="Back-UPS XS 700U",ups_name="bx700-1"} 1
apcupsd_battery_time_left_seconds{job="apcupsd",instance="192.0.2.1",status="ONLINE"} 4062
...
apcupsd_ups_info{job="apcupsd",instance="192.0.2.2",hostname="nuc2",model="Back-UPS XS 700U",ups_name="bx700-2"} 1
apcupsd_battery_time_left_seconds{job="apcupsd",instance="192.0.2.2",status="ONLINE"} 3950

If you want your graphs labelled with the UPS name or hostname, you need to invoke some prometheus join magic. According to this article I think that adding the ups_name to the apcupsd_battery_time_left_seconds metric might look something like this:

apcupsd_battery_time_left_seconds * on (instance, job) group_left(ups_name) apcupsd_ups_info

I don't pretend to understand this, and it may be wrong. In any case it would have to be copy-pasted into every grafana query in the dashboard.

Similarly: suppose you want a drop-down menu to select between UPSes. If you don't mind just showing the 'instance' value this is straightforward:

But if you want the dropdown to select on 'ups_name', it gets tricky.

candlerb commented 5 years ago

I am ambivalent, however, about the loss of metadata from the individual metrics.

Including metadata in every metric is how the official snmp_exporter works.

For reference, discussion about it at https://github.com/prometheus/snmp_exporter/issues/180 and https://github.com/prometheus/snmp_exporter/pull/339

mdlayher commented 4 years ago

I ended up fixing things up, thanks for your time.

crockk commented 3 years ago

Can this be revisited? It would be very useful to have status info in the metrics, or as a label in apcupsd_info. I made an issue pertaining to it: https://github.com/mdlayher/apcupsd_exporter/issues/19

candlerb commented 3 years ago

FWIW, I changed to https://github.com/damomurf/apcupsd-exporter a while ago.

crockk commented 3 years ago

Fair enough, but I need both status metrics and the metrics that this exporter provides... Both exporters use the same command line tool so it seems a bit silly to be installing two exporters for the same tool 😅