Closed k0ste closed 1 year ago
Ping @bitfehler
Hi @k0ste, sorry and thanks for the ping. Thanks for the patch! In general, this is a welcome addition. One issue though: I'd much prefer the metric to simply not be there if it is unsupported. Could this be done, or is there a reason why the additional metric is needed? For example, is it possible for the ipmi_dcmi_power_measurement_state
to change in the lifetime of a PSU? From what I understand it just depends on the hardware and never changes? If so, I'd prefer using the result of the availability check to simply drop the power consumption metric completely if not available.
Hi, @bitfehler! When I was imagine, how's better to handle this, I'm also thinked to this side, but:
0 != Nan
ipmi_dcmi_power_consumption_watts == 0
as query, and this queries should be rewritten (to unless()
, I guess)ipmi_dcmi_power_consumption_watts
, when the measurement is zero, this will not help to debug: why the power metrics was absent or lowered:
ipmi_dcmi_power_measurement_state == 1
& ipmi_dcmi_power_consumption_watts == 0
ipmi_dcmi_power_measurement_state == 0
& ipmi_dcmi_power_consumption_watts == 0
Thanks!
Zero watts is also a metric, i.e. 0 != Nan
Absolutely!
3. If we drop the
ipmi_dcmi_power_consumption_watts
, when the measurement is zero
I was actually thinking more like this: do perform the test for active/not available that you added, if "active" output metric regardless of value, if not available drop metric. Shouldn't that do?
I was actually thinking more like this: do perform the test for active/not available that you added, if "active" output metric regardless of value, if not available drop metric. Shouldn't that do?
The 3rd point āļø exactly about this idea š
I am not sure I understand. Your 3rd point says
If we drop the ipmi_dcmi_power_consumption_watts, when the measurement is zero
but my suggestion is to only drop it if the new test you added in this PR says not available (i.e. when ipmi_dcmi_power_measurement_state
would be 0).
As I understand it, with your current patch, the decision table would be:
ipmi_dcmi_power_consumption_watts |
ipmi_dcmi_power_measurement_state |
Deduction |
---|---|---|
0 |
0 |
BMC/PSU don't support it |
0 |
1 |
something is amiss, investigation needed |
>0 | 0 |
impossible |
>0 | 1 |
metric as intended |
Or am I missing something?
My suggestion would simply be to translate this into the following decision table:
ipmi_dcmi_power_consumption_watts |
Deduction |
---|---|
N/A | BMC/PSU don't support it |
0 | something is amiss, investigation needed |
> 0 | metric as intended |
Note the absence of an "impossible" state, hence I think it would make for a better solution. Sorry if I am misunderstanding s/t, maybe you can elaborate on your third point?
Note the absence of an "impossible" state, hence I think it would make for a better solution. Sorry if I am misunderstanding s/t, maybe you can elaborate on your third point?
You understanding is totally correctly. (But) If we translate "my table" to "your table" - this will be breaking changes in current how ipmi_exporter
performed, where ipmi_dcmi_power_consumption_watts
is always present
If the project okay for small breaking changes, I can prepare the release 1.7 with notes to CHANGELOG
š
@k0ste a breaking change is totally fine. can you update the patch accordingly? thanks a lot!
@bitfehler updated! The checks:
Host with Power Measurement
root@host# curl -Ss "http://localhost:9291/ipmi?module=default&target=10.249.0.9" | grep -i power
# HELP ipmi_chassis_power_state Current power state (1=on, 0=off).
# TYPE ipmi_chassis_power_state gauge
ipmi_chassis_power_state 1
# HELP ipmi_dcmi_power_consumption_watts Current power consumption in Watts (no value if Power Measurement is not avail)
# TYPE ipmi_dcmi_power_consumption_watts gauge
ipmi_dcmi_power_consumption_watts 372
ipmi_sensor_state{id="4024",name="PS1 Status",type="Power Supply"} 0
ipmi_sensor_state{id="4091",name="PS2 Status",type="Power Supply"} 0
ipmi_sensor_value{id="4024",name="PS1 Status",type="Power Supply"} NaN
ipmi_sensor_value{id="4091",name="PS2 Status",type="Power Supply"} NaN
Host without Power Measurement
root@host# curl -Ss "http://localhost:9291/ipmi?module=default&target=10.249.0.6" | grep -i power
# HELP ipmi_chassis_power_state Current power state (1=on, 0=off).
# TYPE ipmi_chassis_power_state gauge
ipmi_chassis_power_state 1
ipmi_sensor_state{id="4158",name="PS1 Status",type="Power Supply"} 0
ipmi_sensor_state{id="4225",name="PS2 Status",type="Power Supply"} 0
ipmi_sensor_value{id="4158",name="PS1 Status",type="Power Supply"} NaN
ipmi_sensor_value{id="4225",name="PS2 Status",type="Power Supply"} NaN
ping @bitfehler
Changes maden š
Added DCMI metric to definitely determine avail of Power Measurement feature
In some cases, for example with cheap Power Supply Units power consumption may be not available at all, to control this added
ipmi_dcmi_power_measurement_state
metric. The value is1
for power measure is available on, and0
otherwise.The main point for this changes is a try to knowledge: the measurements isn't accessed due PSU broken (FW/cable/etc issues) or not avail at all on system level (and system know about that)
The host with "normal PSU":
The host with "strange PSU's"