stormshift / support

This repo should serve as a central source for reporting issues with stormshift
GNU General Public License v3.0
3 stars 0 forks source link

add storm1-6 air intake temp to thruk #103

Open DanielFroehlich opened 2 years ago

DanielFroehlich commented 2 years ago

It should be possible to expose IDRAC SNMP air intake temperatures. Add these as service to thruk, to improver early warnings on over temp events.

rbo commented 2 years ago

Sounds good, if you have the OID's and information ping me to integrate this into the monitoring. Feel free to use inf31 for the tests, here an bunch of examples:

$ ssh inf31.coe.muc.redhat.com
Warning: Permanently added 'inf31.coe.muc.redhat.com' (ED25519) to the list of known hosts.
Last login: Wed Aug 17 19:31:31 2022 from 10.39.192.87
[root@inf31 ~]# sudo su - mon
Last login: Wed Aug 17 19:31:59 BST 2022 on pts/0
OMD[mon@inf31]:~$ snmp[tab tab]
snmpbulkget         snmp_exporter       snmpps              snmptop             snmpvacm
snmpbulkwalk        snmpget             snmpset             snmptranslate       snmpwalk
snmpconf            snmpgetnext         snmpstatus          snmptrap            
snmpd               snmpinform          snmptable           snmptrapd           
snmpdelta           snmpnetstat         snmptest            snmptrap_logger.sh  
snmpdf              snmpping            snmptls             snmpusm             
OMD[mon@inf31]:~$ history |grep snmp
# You will find a bunch of commands...

Here for example an OID check of the APC:

OMD[mon@inf31]:~$  ~/lib/monitoring-plugins/check_snmp -H 10.32.104.52 -o 1.3.6.1.4.1.318.1.1.26.10.2.2.1.8.1 -m ALL
SNMP OK - 204 | PowerNet-MIB::hardware.26.10.2.2.1.8.1=204 
OMD[mon@inf31]:~$ 

Let me know If I can support.

DanielFroehlich commented 2 years ago

Here's some OID for storm3:

OMD[mon@inf31]:~$ snmpwalk -v 1 -c public 10.32.104.92 1.3.6.1.4.1.674.10892.5.4.700.20.1.8
RFC1155-SMI::enterprises.674.10892.5.4.700.20.1.8.1.1 = STRING: "System Board Inlet Temp"
RFC1155-SMI::enterprises.674.10892.5.4.700.20.1.8.1.2 = STRING: "System Board Exhaust Temp"
RFC1155-SMI::enterprises.674.10892.5.4.700.20.1.8.1.3 = STRING: "CPU1 Temp"
RFC1155-SMI::enterprises.674.10892.5.4.700.20.1.8.1.4 = STRING: "CPU2 Temp"
DanielFroehlich commented 2 years ago

And the actual values/current Readings are on these OID:

$ snmpwalk -v 1 -c public 10.32.104.92 1.3.6.1.4.1.674.10892.5.4.700.20.1.6
RFC1155-SMI::enterprises.674.10892.5.4.700.20.1.6.1.1 = INTEGER: 180
RFC1155-SMI::enterprises.674.10892.5.4.700.20.1.6.1.2 = INTEGER: 250
RFC1155-SMI::enterprises.674.10892.5.4.700.20.1.6.1.3 = INTEGER: 550
RFC1155-SMI::enterprises.674.10892.5.4.700.20.1.6.1.4 = INTEGER: 450

(values in centigrade celsius)

DanielFroehlich commented 2 years ago

Btw, the "early warning sign" of AC failure seems to be an Inlet Temp >21. I suggest we set a very first tight limit to learn. image

rbo commented 2 years ago

We can also add power consumtion for the dell servers too:

OMD[mon@inf31]:~/etc/naemon/conf.d/services$ snmpwalk -v 1 -c public 10.32.104.92 1.3.6.1.4.1.674.10892.5.4.600.60
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.1.1.1 = INTEGER: 1
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.2.1.1 = INTEGER: 1
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.3.1.1 = INTEGER: 0
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.4.1.1 = INTEGER: 2
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.5.1.1 = INTEGER: 3
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.6.1.1 = STRING: "System Power Consumption data"
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.7.1.1 = INTEGER: 19369932
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.8.1.1 = STRING: "20160421015729.000000-360"
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.9.1.1 = INTEGER: 761
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.10.1.1 = STRING: "20160421015729.000000-360"
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.11.1.1 = STRING: "20220920070802.000000-360"
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.12.1.1 = INTEGER: 33
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.13.1.1 = STRING: "20160421015729.000000-360"
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.14.1.1 = STRING: "20220920070802.000000-360"
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.15.1.1 = INTEGER: 371
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.16.1.1 = INTEGER: 660
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.17.1.1 = INTEGER: 3
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.18.1.1 = INTEGER: 0
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.19.1.1 = INTEGER: 520
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.20.1.1 = INTEGER: 429
RFC1155-SMI::enterprises.674.10892.5.4.600.60.1.21.1.1 = INTEGER: 139

https://www.dell.com/support/manuals/de-de/idrac9-lifecycle-controller-v3.3-series/snmp_idrac_cmc_9.3_ref_guide/power-usage-table?guid=guid-2268ca3e-ad8d-4d42-8f58-a755f244966c&lang=en-us