lausser / check_nwc_health

nwc = network component. This plugin checks lots of aspects of routers, switches, wlan controllers, firewalls,.....
http://labs.consol.de/nagios/check_nwc_health
GNU General Public License v2.0
146 stars 88 forks source link

Cisco IMC UCS C220 M5SX - health check OK with fault memory #323

Closed petr-fischer closed 9 months ago

petr-fischer commented 9 months ago

Hello, we have faulted RAM memory on Cisco IMC UCS C220 M5SX, but mode "hardware-health" returns OK state. Bad...

Details:

./check_nwc_health --mode hardware-health  --hostname X.X.X.X --community XXX --authprotocol sha --privprotocol aes --username mon --authpassword XXX  --privpassword XXX --protocol 3 -vvv

[ALARMSUBSYSTEM]
ceAlarmCriticalCount: 0
ceAlarmMajorCount: 0
ceAlarmMinorCount: 0
info: no alarms

OK - environmental hardware working fine
no alarms

smtpwalk outputs:

./check_nwc_health --mode walk  --hostname X.X.X.X --community XXX --authprotocol sha --privprotocol aes --username mon --authpassword XXX  --privpassword XXX --protocol 3

snmpwalk -ObentU -X 'XXX' -A 'XXX' -a 'sha' -u 'mon' -v '3' -x 'aes' -l authPriv x.x.x.x 1.3.6.1.2.1
snmpwalk -ObentU -X 'XXX' -A 'XXX' -a 'sha' -u 'mon' -v '3' -x 'aes' -l authPriv x.x.x.x 1.3.6.1.4.1
./snmpwalk -ObentU -X 'XXX' -A 'XXX' -a 'sha' -u 'mon' -v '3' -x 'aes' -l authPriv x.x.x.x 1.3.6.1.2.1

.1.3.6.1.2.1.1.1.0 = STRING: Cisco Integrated Management Controller(Cisco IMC) UCS C220 M5SX, Firmware Version 4.2(2a), Copyright (c) 2008-2022, Cisco Systems, Inc.
.1.3.6.1.2.1.1.2.0 = OID: .1.3.6.1.4.1.9.1.2492
.1.3.6.1.2.1.1.3.0 = 4844353
.1.3.6.1.2.1.1.4.0 = STRING: sanstorage@xxx.com
.1.3.6.1.2.1.1.5.0 = STRING: s20337sxci010
.1.3.6.1.2.1.1.6.0 = STRING: DC XXX
.1.3.6.1.2.1.1.7.0 = INTEGER: 72
.1.3.6.1.2.1.1.8.0 = 347
.1.3.6.1.2.1.1.9.1.2.1 = OID: .1.3.6.1.6.3.1
.1.3.6.1.2.1.1.9.1.2.2 = OID: .1.3.6.1.6.3.10.3.1.1
.1.3.6.1.2.1.1.9.1.2.3 = OID: .1.3.6.1.6.3.11.3.1.1
.1.3.6.1.2.1.1.9.1.2.4 = OID: .1.3.6.1.6.3.15.2.1.1
.1.3.6.1.2.1.1.9.1.3.1 = STRING: The MIB module for SNMPv2 entities
.1.3.6.1.2.1.1.9.1.3.2 = STRING: The SNMP Management Architecture MIB.
.1.3.6.1.2.1.1.9.1.3.3 = STRING: The MIB for Message Processing and Dispatching.
.1.3.6.1.2.1.1.9.1.3.4 = STRING: The management information definitions for the SNMP User-based Security Model.
.1.3.6.1.2.1.1.9.1.4.1 = 347
.1.3.6.1.2.1.1.9.1.4.2 = 347
.1.3.6.1.2.1.1.9.1.4.3 = 347
.1.3.6.1.2.1.1.9.1.4.4 = 347
.1.3.6.1.2.1.2.1.0 = INTEGER: 1
.1.3.6.1.2.1.2.2.1.1.1 = INTEGER: 1
.1.3.6.1.2.1.2.2.1.2.1 = STRING: eth0
.1.3.6.1.2.1.2.2.1.3.1 = INTEGER: 6
.1.3.6.1.2.1.2.2.1.4.1 = INTEGER: 1496
.1.3.6.1.2.1.2.2.1.5.1 = Gauge32: 100000000
.1.3.6.1.2.1.2.2.1.6.1 = STRING: 0:45:1d:69:24:80
.1.3.6.1.2.1.2.2.1.7.1 = INTEGER: 1
.1.3.6.1.2.1.2.2.1.8.1 = INTEGER: 1
.1.3.6.1.2.1.2.2.1.9.1 = 0
...etc etc...
./snmpwalk -ObentU -X 'XXX' -A 'XXX' -a 'sha' -u 'mon' -v '3' -x 'aes' -l authPriv x.x.x.x 1.3.6.1.4.1

.1.3.6.1.4.1.9.9.719.1.1.1.1.2.137360672 = STRING: "sys/rack-unit-1/board/memarray-1/mem-16/fault-F1706"
.1.3.6.1.4.1.9.9.719.1.1.1.1.2.4110518272 = STRING: "sys/cloud-mgmt/device-connector/fault-F1983"
.1.3.6.1.4.1.9.9.719.1.1.1.1.3.137360672 = STRING: "F1706"
.1.3.6.1.4.1.9.9.719.1.1.1.1.3.4110518272 = STRING: "F1983"
.1.3.6.1.4.1.9.9.719.1.1.1.1.4.137360672 = OID: .1.3.6.1.4.1.9.9.719.1.30.11.1
.1.3.6.1.4.1.9.9.719.1.1.1.1.4.4110518272 = OID: .0.0
.1.3.6.1.4.1.9.9.719.1.1.1.1.5.137360672 = STRING: "sys/rack-unit-1/board/memarray-1/mem-16"
.1.3.6.1.4.1.9.9.719.1.1.1.1.5.4110518272 = STRING: "sys/cloud-mgmt/device-connector"
.1.3.6.1.4.1.9.9.719.1.1.1.1.6.137360672 = INTEGER: 1
.1.3.6.1.4.1.9.9.719.1.1.1.1.6.4110518272 = INTEGER: 1
.1.3.6.1.4.1.9.9.719.1.1.1.1.7.137360672 = INTEGER: 389
.1.3.6.1.4.1.9.9.719.1.1.1.1.7.4110518272 = INTEGER: 392
.1.3.6.1.4.1.9.9.719.1.1.1.1.8.137360672 = STRING: "unknown"
.1.3.6.1.4.1.9.9.719.1.1.1.1.8.4110518272 = STRING: "unknown"
.1.3.6.1.4.1.9.9.719.1.1.1.1.9.137360672 = INTEGER: 1706
.1.3.6.1.4.1.9.9.719.1.1.1.1.9.4110518272 = INTEGER: 1983
.1.3.6.1.4.1.9.9.719.1.1.1.1.10.137360672 = Hex-STRING: 07 E7 09 13 0F 23 15 00
.1.3.6.1.4.1.9.9.719.1.1.1.1.10.4110518272 = Hex-STRING: 07 E7 09 0F 02 35 2C 00
.1.3.6.1.4.1.9.9.719.1.1.1.1.11.137360672 = STRING: "ADDDC Bank-level adaptive virtual lockstep is activated on DIMM DDR4_P2_H2_ECC. Post Package Repair will be performed on this DIMM during the next system reboot."
.1.3.6.1.4.1.9.9.719.1.1.1.1.11.4110518272 = STRING: "This device has connectivity to Cisco Intersight, but has not been claimed. To take advantage of the features of Cisco Intersight, please claim this device to your Intersight account. For help visit intersight.com/help."
.1.3.6.1.4.1.9.9.719.1.1.1.1.12.137360672 = INTEGER: 6
.1.3.6.1.4.1.9.9.719.1.1.1.1.12.4110518272 = INTEGER: 6
.1.3.6.1.4.1.9.9.719.1.1.1.1.13.137360672 = Counter64: 137360672
.1.3.6.1.4.1.9.9.719.1.1.1.1.13.4110518272 = Counter64: 4110518272
.1.3.6.1.4.1.9.9.719.1.1.1.1.14.137360672 = Hex-STRING: 07 E7 09 13 0F 23 15 00
.1.3.6.1.4.1.9.9.719.1.1.1.1.14.4110518272 = Hex-STRING: 07 E7 09 0F 02 35 2C 00
.1.3.6.1.4.1.9.9.719.1.1.1.1.15.137360672 = INTEGER: 0
.1.3.6.1.4.1.9.9.719.1.1.1.1.15.4110518272 = INTEGER:
... etc etc ...
lausser commented 9 months ago

Hi, nwc stands for network component and although some modes work for other snmp-speaking devices too, only those MIBs are implemented, which are found in our customers network devices. From https://www.cisco.com/c/en/us/products/servers-unified-computing/ucs-c-series-integrated-management-controller/index.html i see this imc is a monitoring platform itself. Didn't it alarm somehow?