Open acozine opened 1 year ago
Our current nagios-based monitoring system for rack temp and humidity is only accessible from Windows jump-hosts. It is ancient and would need upgrading to serve as a multi-purpose monitoring platform. Docs are in Google Drive.
Get alerts for any hardware failure instead of looking for orange lights in the racks (fan, electric supply, temperature, etc.) Can also monitor memory and CPU - where is it best to do this?
Possible tools and protocols: Currently use Dell Tools or HP via SNMP - but this approach means server needs to be up Could also use IPMI (iDrac interface) Prometheus / Zabbix / Centreon - this is what TigerData is planning to use Anything that uses IPMI will need to sit on the out-of-band / private/ protected network We currently use Nagios on this private network (using SNMP) for monitoring rack temp and humidity