lausser / check_hpasm

A plugin (monitoring-plugin, not nagios-plugin, see also http://is.gd/PP1330) which checks the hardware health of HP Proliant Servers. (May also be used for other devices which implement the CPQHLTH mib)
http://labs.consol.de/nagios/check_hpasm/
GNU General Public License v2.0
16 stars 18 forks source link

Proliant Gen11 problem with cpu and drive status #28

Closed erSitzt closed 11 months ago

erSitzt commented 1 year ago

Hi, not sure if this is still actively maintained but anyway :)

Gen11 Proliant reports the following

CRITICAL - cpu 1 needs attention (failed), physical drive 1:1 is other, physical drive 1:2 is other, System: 'proliant dl380 gen11', S/N: 'CZ23250HWX', ROM: 'U54 v1.30'

This is an VMware ESXi host and neither iLO nor VMware itself are reporting any issues.

We have two other identical Gen11, all of them are reporting the drives as "other", only one thinks the CPU failed. These are Intel(R) Xeon(R) Gold 6458Q CPUs, where cores can be disabled to increase the CPU base frequency. 32 cores => 3Ghz 24 cores => 3,5Ghz 16 cores => 4Ghz

This one is configured with 24 cores, so 8 are disabled The one with 16 cores does not have a problem with the cpu according to check_hpasm

I did not yet check the snmp output, but i can provide it if needed

erSitzt commented 8 months ago

@fragfutter thanks for the drive status fix

@lausser The CPU issue still remains with the current version

erSitzt commented 8 months ago

ILO has nothing to complain about

image image
lausser commented 8 months ago

When the plugin says: cpu has status "failed", then it only repeats the value of the OID. When you run an snmpwalk against this ILO, you will surely see the "failed" as well.