Open jcpunk opened 10 months ago
@SuperQ I need a second opinion here. What does the Big Book of Prometheus Best Practices say about this sort of thing? Should we go with three different metric names, or a single cv_state
metric, with the state contained within a label?
FWIW: https://github.com/prometheus-community/systemd_exporter/tree/main provides:
systemd_unit_state{name="sysinit.target",state="activating",type="target"} 0
systemd_unit_state{name="sysinit.target",state="active",type="target"} 1
systemd_unit_state{name="sysinit.target",state="deactivating",type="target"} 0
systemd_unit_state{name="sysinit.target",state="failed",type="target"} 0
systemd_unit_state{name="sysinit.target",state="inactive",type="target"} 0
There are essentially three ways we can go about this. For example, if a CacheVault is degraded, we could expose:
cv_optimal{controller="0",cvidx="1"} 0
cv_degraded{controller="0",cvidx="1"} 1
cv_failed{controller="0",cvidx="1"} 0
or
cv_state{controller="0",cvidx="1",state="optimal"} 0
cv_state{controller="0",cvidx="1",state="degraded"} 1
cv_state{controller="0",cvidx="1",state="failed"} 0
or merely
cv_state{controller="0",cvidx="1",state="degraded"} 1
The first two methods are largely the same, although I would argue that the second method is slightly more user-friendly, as it would allow the contents of the state
label to be used verbatim in Grafana dashboards with a very simple query.
The third method will result in stale metrics for 5 minutes whenever the state changes, due to Prometheus' default look-behind window and the fact that a series effectively disappears when the state label changes.
Updated to try and use example output 2
Any further thoughts on this?
Waiting for input / review from @SuperQ
Adds metrics for the cachevault status.
Hardware tested:
LSI MegaRAID SAS-3 3108 [Invader] (rev 02)