thomas-krenn / check_ipmi_sensor_v3

Monitoring plugin to check IPMI sensors
https://www.thomas-krenn.com/en/wiki/IPMI_Sensor_Monitoring_Plugin
GNU General Public License v3.0
54 stars 21 forks source link

Cloudian HSA-1512 aka QuantaGrid D51PH-1ULH monitoring #20

Closed sl0n closed 6 years ago

sl0n commented 6 years ago

Hi,

I have Cloudian HSA-1512 appliances (OEM QuantaGrid D51PH-1ULH) that I want to monitor with check_ipmi_sensor.

My test case: I pulled out power cable from one of the PSU's.

Here is what I get:

$ ./check_ipmi_sensor.pl -H cloudian100-02 -U myuser -P mypasswd -L user IPMI Status: Critical [PSU Redundancy = Critical, PSU2_Status = Critical, PSU2_Input = Critical] | 'Volt_P12V'=12.36;;11.40:12.60 'Volt_P1V05'=1.06;;0.99:1.11 'Volt_P1V8_AUX'=1.83;;1.71:1.89 'Volt_P3V3'=3.39;;3.13:3.47 'Volt_P3V3_AUX'=3.37;;3.13:3.47 'Volt_P3V_BAT'=3.19;;2.70:3.60 'Volt_P5V'=5.01;;4.74:5.25 'Volt_P5V_AUX'=4.99;;4.74:5.25 'Temp_PCI1_Outlet'=35.00;~:80.00;~:85.00 'Temp_CPU0_Inlet'=31.00;~:70.00;~:75.00 'Temp_CPU1_Inlet'=32.00;~:70.00;~:75.00 'Temp_CPU0'=35.00;~:88.00;~:89.00 'Temp_CPU1'=37.00;~:88.00;~:89.00 'Temp_DIMM_AB'=31.00;~:84.00;~:85.00 'Temp_DIMM_CD'=30.00;~:84.00;~:85.00 'Temp_DIMM_EF'=34.00;~:84.00;~:85.00 'Temp_DIMM_GH'=31.00;~:84.00;~:85.00 'Temp_PCI2_Outlet'=38.00;~:80.00;~:85.00 'Temp_PCH'=36.00;~:98.00;~:100.00 'Temp_VR_CPU0'=31.00;~:104.00;~:105.00 'Temp_VR_CPU1'=34.00;~:104.00;~:105.00 'Temp_VR_DIMM_AB'=34.00;~:104.00;~:105.00 'Temp_VR_DIMM_CD'=35.00;~:104.00;~:105.00 'Temp_VR_DIMM_EF'=37.00;~:104.00;~:105.00 'Temp_VR_DIMM_GH'=33.00;~:104.00;~:105.00 'Temp_BP1_1'=32.00;~:75.00;~:80.00 'Temp_BP1_2'=32.00;~:75.00;~:80.00 'Temp_BP2_1'=31.00;~:75.00;~:80.00 'Temp_BP2_2'=32.00;~:75.00;~:80.00 'Temp_BP3_1'=27.00;~:75.00;~:80.00 'Temp_BP3_2'=28.00;~:75.00;~:80.00 'Temp_SSD_BP'=24.00;~:75.00;~:80.00 'Temp_Inlet'=22.00;~:36.00;5.00:38.00 'Temp_PSU1'=31.00;~:65.00;~:66.00 'Temp_PSU2'=27.00;~:65.00;~:66.00 'Temp_SAS_EXP_0'=35.00;~:75.00;~:80.00 'Temp_SAS_EXP_1'=39.00;~:75.00;~:80.00 'Temp_SAS_EXP_OCS'=71.00;~:111.00;~:113.00 'Volt_VR_CPU0'=1.79;;1.35:1.96 'Volt_VR_CPU1'=1.79;;1.35:1.96 'Volt_VR_DIMM_AB'=1.22;;1.08:1.32 'Volt_VR_DIMM_CD'=1.22;;1.08:1.32 'Volt_VR_DIMM_EF'=1.22;;1.08:1.32 'Volt_VR_DIMM_GH'=1.22;;1.08:1.32 'Volt_SAS_EXP_0V9'=0.92;;0.81:0.99 'Volt_SAS_EXP_12V'=12.10;;10.84:13.36 'Volt_SAS_EXP_3V3'=3.32;;2.96:3.65 'Volt_SAS_EXP_VCC'=3.30;;2.96:3.61 'Airflow'=16.00 'Temp_Outlet'=40.00 'Fan_SYS0_1'=6400.00;;500.00: 'Fan_SYS0_2'=5400.00;;500.00: 'Fan_SYS1_1'=6500.00;;500.00: 'Fan_SYS1_2'=5400.00;;500.00: 'Fan_SYS2_1'=6400.00;;500.00: 'Fan_SYS2_2'=5400.00;;500.00: 'Fan_SYS3_1'=6500.00;;500.00: 'Fan_SYS3_2'=5400.00;;500.00: 'Fan_SYS4_1'=6500.00;;500.00: 'Fan_SYS4_2'=5400.00;;500.00: 'Fan_SYS5_1'=6500.00;;500.00: 'Fan_SYS5_2'=5400.00;;500.00: 'Fan_PSU1'=11500.00;;500.00: 'PSU1_Input'=168.00 'PSU2_Input'=0.00

What I expect: only FRU's that failed should alert, ie in this case

Critical [PSU Redundancy = Critical, PSU2_Status = Critical, PSU2_Input = Critical]

gschoenberger commented 6 years ago

Everything after the pipe | is performance data, e.g. generating graphs with RRD. $ ./check_ipmi_sensor.pl -H cloudian100-02 -U myuser -P mypasswd -L user IPMI Status: Critical [PSU Redundancy = Critical, PSU2_Status = Critical, PSU2_Input = Critical] | From my point of view, the output seems correct if you take the performance data into account.

sl0n commented 6 years ago

@gschoenberger OK, thanks! I was confused by the output a bit. I thought the whole string is going to be in the nagios message.

gschoenberger commented 6 years ago

No problem, great to see the plugin in the wild!