lausser / check_hpasm

A plugin (monitoring-plugin, not nagios-plugin, see also http://is.gd/PP1330) which checks the hardware health of HP Proliant Servers. (May also be used for other devices which implement the CPQHLTH mib)
http://labs.consol.de/nagios/check_hpasm/
GNU General Public License v2.0
16 stars 18 forks source link

Fan speed value type is changed in new firmware version #33

Open teanva opened 4 months ago

teanva commented 4 months ago

SNMPv2-SMI::enterprises.232.6.2.6.7.1.6.0.1 = INTEGER: 12 SNMPv2-SMI::enterprises.232.6.2.6.7.1.6.0.2 = INTEGER: 12 SNMPv2-SMI::enterprises.232.6.2.6.7.1.6.0.3 = INTEGER: 12 SNMPv2-SMI::enterprises.232.6.2.6.7.1.6.0.4 = INTEGER: 12 SNMPv2-SMI::enterprises.232.6.2.6.7.1.6.0.5 = INTEGER: 12 SNMPv2-SMI::enterprises.232.6.2.6.7.1.6.0.6 = INTEGER: 14

ILO:

Fan Location Redundant Status Speed Fan 1 System Yes OK 11% Fan 2 System Yes OK 11% Fan 3 System Yes OK 12% Fan 4 System Yes OK 12% Fan 5 System Yes OK 12% Fan 6 System Yes OK 12%

check_hpasm: CRITICAL - fan 1 (system) needs attention, fan 2 (system) needs attention, fan 3 (system) needs attention, fan 4 (system) needs attention, fan 5 (system) needs attention, fan 6 (system) needs attention, System: 'proliant dl380 gen10'

check_hpasm -v:

fan 1 is present, speed is value_12, pctmax is 50%, location is system, redundance is redundant, partner is 2 fan 1 (system) needs attention fan 2 is present, speed is value_12, pctmax is 50%, location is system, redundance is redundant, partner is 3 fan 2 (system) needs attention fan 3 is present, speed is value_12, pctmax is 50%, location is system, redundance is redundant, partner is 4 fan 3 (system) needs attention fan 4 is present, speed is value_12, pctmax is 50%, location is system, redundance is redundant, partner is 5 fan 4 (system) needs attention fan 5 is present, speed is value_12, pctmax is 50%, location is system, redundance is redundant, partner is 6 fan 5 (system) needs attention fan 6 is present, speed is value_12, pctmax is 50%, location is system, redundance is redundant, partner is 1 fan 6 (system) needs attention

Fix:

#      cpqHeFltTolFanSpeedValue => {
#          1 => "other",
#          2 => "normal",
#          3 => "high",
#      },

sub check {
  my $self = shift;
  $self->blacklist('f', $self->{cpqHeFltTolFanIndex});
#  $self->add_info(sprintf 'fan %d is %s, speed is %s, pctmax is %s%%, '.
  $self->add_info(sprintf 'fan %d is %s, speed is %s%%, pctmax is %s%%, '.
      'location is %s, redundance is %s, partner is %s',
     $self->{cpqHeFltTolFanIndex}, $self->{cpqHeFltTolFanPresent},
      $self->{cpqHeFltTolFanSpeed}, $self->{cpqHeFltTolFanPctMax},
      $self->{cpqHeFltTolFanLocale}, $self->{cpqHeFltTolFanRedundant},
      $self->{cpqHeFltTolFanRedundantPartner});
  $self->add_extendedinfo(sprintf 'fan_%s=%d%%',
      $self->{cpqHeFltTolFanIndex}, $self->{cpqHeFltTolFanPctMax});
  if ($self->{cpqHeFltTolFanPresent} eq 'present') {
#    if ($self->{cpqHeFltTolFanSpeed} eq 'high') {
    if ($self->{cpqHeFltTolFanSpeed} > $self->{cpqHeFltTolFanPctMax}) {
      $self->add_info(sprintf 'fan %d (%s) runs at high speed',
          $self->{cpqHeFltTolFanIndex}, $self->{cpqHeFltTolFanLocale});
      $self->add_message(CRITICAL, $self->{info});
#    } elsif ($self->{cpqHeFltTolFanSpeed} ne 'normal') {
    } elsif ($self->{cpqHeFltTolFanSpeed} == '0') {
      $self->add_info(sprintf 'fan %d (%s) needs attention',
          $self->{cpqHeFltTolFanIndex}, $self->{cpqHeFltTolFanLocale});
      $self->add_message(CRITICAL, $self->{info});
    }

check_hpasm: OK - System: 'proliant dl380 gen10'

check_hpasm -v: fan 1 is present, speed is 12%, pctmax is 50%, location is system, redundance is redundant, partner is 2 fan 2 is present, speed is 12%, pctmax is 50%, location is system, redundance is redundant, partner is 3 fan 3 is present, speed is 12%, pctmax is 50%, location is system, redundance is redundant, partner is 4 fan 4 is present, speed is 12%, pctmax is 50%, location is system, redundance is redundant, partner is 5 fan 5 is present, speed is 12%, pctmax is 50%, location is system, redundance is redundant, partner is 6 fan 6 is present, speed is 12%, pctmax is 50%, location is system, redundance is redundant, partner is 1

lausser commented 4 months ago

Can you do me a favour? I plan a rewrite of this plugin, so it can be more easily be maintained. Can you run snmpwalk -ObentU ... 1.3.6.1.2.1 > /tmp/hpilo.teanva.snmpwalk snmpwalk -ObentU ... 1.3.6.1.4.1 >> /tmp/hpilo.teanva.snmpwalk and mail me the snmpwalk-file to gerhard.lausser@consol.de, so i can simulate the server. I currently have no access to any hp ilo.

lukasertl commented 1 month ago

Hi Gerhard,

are you still in need of the snmpwalk-files? I'm also hit by this issue and could provide the requested info.