Open DanielFroehlich opened 2 years ago
Sounds good, if you have the OID's and information ping me to integrate this into the monitoring. Feel free to use inf31 for the tests, here an bunch of examples:
$ ssh inf31.coe.muc.redhat.com
Warning: Permanently added 'inf31.coe.muc.redhat.com' (ED25519) to the list of known hosts.
Last login: Wed Aug 17 19:31:31 2022 from 10.39.192.87
[root@inf31 ~]# sudo su - mon
Last login: Wed Aug 17 19:31:59 BST 2022 on pts/0
OMD[mon@inf31]:~$ snmp[tab tab]
snmpbulkget snmp_exporter snmpps snmptop snmpvacm
snmpbulkwalk snmpget snmpset snmptranslate snmpwalk
snmpconf snmpgetnext snmpstatus snmptrap
snmpd snmpinform snmptable snmptrapd
snmpdelta snmpnetstat snmptest snmptrap_logger.sh
snmpdf snmpping snmptls snmpusm
OMD[mon@inf31]:~$ history |grep snmp
# You will find a bunch of commands...
Here for example an OID check of the APC:
OMD[mon@inf31]:~$ ~/lib/monitoring-plugins/check_snmp -H 10.32.104.52 -o 1.3.6.1.4.1.318.1.1.26.10.2.2.1.8.1 -m ALL
SNMP OK - 204 | PowerNet-MIB::hardware.26.10.2.2.1.8.1=204
OMD[mon@inf31]:~$
Let me know If I can support.
Here's some OID for storm3:
OMD[mon@inf31]:~$ snmpwalk -v 1 -c public 10.32.104.92 1.3.6.1.4.1.674.10892.5.4.700.20.1.8
RFC1155-SMI::enterprises.674.10892.5.4.700.20.1.8.1.1 = STRING: "System Board Inlet Temp"
RFC1155-SMI::enterprises.674.10892.5.4.700.20.1.8.1.2 = STRING: "System Board Exhaust Temp"
RFC1155-SMI::enterprises.674.10892.5.4.700.20.1.8.1.3 = STRING: "CPU1 Temp"
RFC1155-SMI::enterprises.674.10892.5.4.700.20.1.8.1.4 = STRING: "CPU2 Temp"
And the actual values/current Readings are on these OID:
$ snmpwalk -v 1 -c public 10.32.104.92 1.3.6.1.4.1.674.10892.5.4.700.20.1.6
RFC1155-SMI::enterprises.674.10892.5.4.700.20.1.6.1.1 = INTEGER: 180
RFC1155-SMI::enterprises.674.10892.5.4.700.20.1.6.1.2 = INTEGER: 250
RFC1155-SMI::enterprises.674.10892.5.4.700.20.1.6.1.3 = INTEGER: 550
RFC1155-SMI::enterprises.674.10892.5.4.700.20.1.6.1.4 = INTEGER: 450
(values in centigrade celsius)
Btw, the "early warning sign" of AC failure seems to be an Inlet Temp >21. I suggest we set a very first tight limit to learn.
We can also add power consumtion for the dell servers too:
OMD[mon@inf31]:~/etc/naemon/conf.d/services$ snmpwalk -v 1 -c public 10.32.104.92 1.3.6.1.4.1.674.10892.5.4.600.60
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.1.1.1 = INTEGER: 1
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.2.1.1 = INTEGER: 1
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.3.1.1 = INTEGER: 0
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.4.1.1 = INTEGER: 2
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.5.1.1 = INTEGER: 3
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.6.1.1 = STRING: "System Power Consumption data"
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.7.1.1 = INTEGER: 19369932
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.8.1.1 = STRING: "20160421015729.000000-360"
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.9.1.1 = INTEGER: 761
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.10.1.1 = STRING: "20160421015729.000000-360"
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.11.1.1 = STRING: "20220920070802.000000-360"
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.12.1.1 = INTEGER: 33
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.13.1.1 = STRING: "20160421015729.000000-360"
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.14.1.1 = STRING: "20220920070802.000000-360"
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.15.1.1 = INTEGER: 371
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.16.1.1 = INTEGER: 660
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.17.1.1 = INTEGER: 3
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.18.1.1 = INTEGER: 0
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.19.1.1 = INTEGER: 520
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.20.1.1 = INTEGER: 429
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.21.1.1 = INTEGER: 139
.60.1.9.1.1
=> powerUsagePeakWattspowerUsagePeakWatts
.60.1.20.1.1
=> powerUsageInstantaneousHeadroom
.60.1.16.1.1
=> powerUsageMaxPotentialPower
It should be possible to expose IDRAC SNMP air intake temperatures. Add these as service to thruk, to improver early warnings on over temp events.