Closed maltewhiite closed 3 years ago
Hi maltewhiite,
the identifiers do not depend on check_ipmi_sensor but on the system/mainboard/BMC vendor's definitions. This also applies to the criticality, which is also often set quite adventurously by some vendors.
You might gain additional information by either having a look into the BMC by either IPMI, SSH or web interface. For many vendors, BP is an abbreviation for backplane and usually refers to a storage backplane.
Also some vendors push presence assertions for some or even all supported but optional and therefore maybe not present hardware components to the System Event Log/SEL either at Power-on Self Test/POST or when upgrading the system firmware/UEFI or BMC/IPMI controller firmware, even if such hardware options were never connected to the system before.
I suppose, this is the case here. Thus after checking the BMC/IPMI controller that there is no persisting problem, you could make the alert disappear by deleting the corresponding event(s) from the SEL and/or emptying the SEL.
If you are not content with cleaning the SEL after reboots or firmware upgrades I'd advise to check the BMC/IPMI controller whether it can be configured to never assume that hardware not currently present is missing. I STRONGLY advise to never disable SEL monitoring in check_ipmi_sensor, as the SEL is monitored by default to alert you about non-persistent errors, such as unreliable power supplies or power cabling, corrected RAM errors that might not provoke an MCE, failed components without own sensor values, ...
HtH and best regards, // Veit
Am Donnerstag, dem 25.11.2021 um 05:52 -0800 schrieb maltewhiite:
The guy who set all this up, doesn't work here anymore, so nobody knows what this NAGIOS alert means. It says IPMI Status: Critical [BP0 Presence = Critical] And the "check_ipmi_sensor" command says IPMI Status: Critical [Presence = Critical, Presence = Critical, BP0 Presence = Critical] | 'Current Power'=366 'Temp'=63.00 'Temp'=76.00 'Inlet Temp'=20.00;3.00:33.00;-7.00:37.00 'Fan1'=6120.00;840.00:;480.00: 'Fan2'=6120.00;840.00:;480.00: 'Fan3'=6240.00;840.00:;480.00: 'Fan4'=6360.00;840.00:;480.00: 'Fan5'=6240.00;840.00:;480.00: 'Fan6'=6240.00;840.00:;480.00: 'Current 1'=0.80 'Current 2'=0.80 'Voltage 1'=486.00 'Voltage 2'=486.00 'Pwr Consumption'=726.00;~:2354.00;~:2596.00 'IO Usage'=0.00;~:101.00; 'MEM Usage'=0.00;~:101.00; 'SYS Usage'=4.00;~:101.00; 'CPU Usage'=4.00;~:101.00; 'Exhaust Temp'=44.00;8.00:75.00;3.00:80.00 What is BP0?
Thanks a lot! I will forward this to our hardware team.
The guy who set all this up, doesn't work here anymore, so nobody knows what this NAGIOS alert means. It says
IPMI Status: Critical [BP0 Presence = Critical]
And the "check_ipmi_sensor" command saysIPMI Status: Critical [Presence = Critical, Presence = Critical, BP0 Presence = Critical] | 'Current Power'=366 'Temp'=63.00 'Temp'=76.00 'Inlet Temp'=20.00;3.00:33.00;-7.00:37.00 'Fan1'=6120.00;840.00:;480.00: 'Fan2'=6120.00;840.00:;480.00: 'Fan3'=6240.00;840.00:;480.00: 'Fan4'=6360.00;840.00:;480.00: 'Fan5'=6240.00;840.00:;480.00: 'Fan6'=6240.00;840.00:;480.00: 'Current 1'=0.80 'Current 2'=0.80 'Voltage 1'=486.00 'Voltage 2'=486.00 'Pwr Consumption'=726.00;~:2354.00;~:2596.00 'IO Usage'=0.00;~:101.00; 'MEM Usage'=0.00;~:101.00; 'SYS Usage'=4.00;~:101.00; 'CPU Usage'=4.00;~:101.00; 'Exhaust Temp'=44.00;8.00:75.00;3.00:80.00
Assume I know nothing about hardware. I just have a software education, and was suddenly tasked with taking over the Nagios monitoring.
What is BP0?