lausser / check_hpasm

A plugin (monitoring-plugin, not nagios-plugin, see also http://is.gd/PP1330) which checks the hardware health of HP Proliant Servers. (May also be used for other devices which implement the CPQHLTH mib)
http://labs.consol.de/nagios/check_hpasm/
GNU General Public License v2.0
16 stars 18 forks source link

whoami returns "Storage" using SNMP to hp-snmp-agents on ProLiant BL460c Gen8 #7

Open terryburton opened 9 years ago

terryburton commented 9 years ago

With hp-snmp-agents on a ProLiant BL460c Gen8 the result of valid_response is undef due to the condition $result->{$oid} eq 'noSuchInstance' which results in whoami returning Storage.

$ snmpget -v2c -c public HOST 1.3.6.1.4.1.232.2.2.4.2.0 
SNMPv2-SMI::enterprises.232.2.2.4.2.0 = ""

Verified that overriding $self->{productname} = 'ProLiant' at the end of whoami make things work as expected.

I'm not sure what real life situations each of the guards in valid_response protects against so I wasn't able to come up with a patch. Sorry!


hpasmcli> show server
System        : ProLiant BL460c Gen8
Serial No.    : [ REDACTED ]
ROM version   : I31 02/10/2014
# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 8 (jessie)"
NAME="Debian GNU/Linux"
VERSION_ID="8"
VERSION="8 (jessie)"
ID=debian
HOME_URL="http://www.debian.org/"
SUPPORT_URL="http://www.debian.org/support/"
BUG_REPORT_URL="https://bugs.debian.org/"
# apt-cache policy hp-snmp-agents
hp-snmp-agents:
  Installed: 10.0.0.1.23-20.
klangborste commented 7 years ago

If i have analyze my issue correct then i have the same problem at the moment. Hardware: ProLiant DL580 Gen8 OS: XEN 7.0.0-125380c

I got this output from "OK - System: '', S/N: '', ROM: '', hardware working fine" check_hpasm. With verbose switch it starts with: "snmp agent answeredwhoami: Storageusing HP::Storage"

Is the problem within hp-snmp-agents or check_hpasm? Is there maybe a workaround after this was 2 years reported?

wschlich commented 3 years ago

Same issue here with two ProLiant BL460c Gen9 blade servers:

~$ /usr/lib/nagios/plugins-extra/check_hpasm -H 127.0.0.1 -C public -vvv
snmp agent answered
whoami: StorageWorks
using HP::StorageWorks
Protocol is 2c
000 seconds for walk cpqHoMibStatusArray (1 oids)
OK - overall status is ok, StorageWorks, hardware working fine
overall status is ok
~$ 

Forcing servertype to ProLiant results in this:

~$ /usr/lib/nagios/plugins-extra/check_hpasm -H 127.0.0.1 -C public -vvv --servertype proliant
snmp agent answered
whoami: StorageWorks
using HP::Proliant::SNMP
CRITICAL - snmpwalk returns no health data (cpqhlth-mib), System: 'unknown', S/N: 'unknown', ROM: 'unknown'

~$ 

It seems that the OID 1.3.6.1.4.1.232.2 is completely unknown to snmpd:

~$ snmpget -v 2c -c public localhost 1.3.6.1.4.1.232.2.2.4.2.0
SNMPv2-SMI::enterprises.232.2.2.4.2.0 = No Such Object available on this agent at this OID
~$ 

With a Gen10 blade server, it works fine (also check_hpasm works fine there):

~$ snmpget -v 2c -c public localhost 1.3.6.1.4.1.232.2.2.4.2.0
SNMPv2-SMI::enterprises.232.2.2.4.2.0 = STRING: "ProLiant BL460c Gen10"
~$ 

Maybe this relates to the issue:

smad[48726]: AMS Linux AgentX sub-agent connecting to AgentX master
smad[48726]: No response from iLO for Hello
amsd[48723]: amsd Started . .

We'll try resetting the iLO.

wschlich commented 3 years ago

We'll try resetting the iLO.

That didn't help.

wschlich commented 8 months ago

Restarting the systemd service smad_rev.service helped, btw.

For all the SNMP monitoring stuff to work, one seems to need the following HP specific stuff:

  1. Installed amsd package from a matching repo at http://downloads.linux.hpe.com/ (e.g. spp-gen10)
  2. Enabled and running amsd_rev.service (systemctl enable --now amsd_rev.service)
  3. Enabled and running smad_rev.service (systemctl enable --now smad_rev.service)