Open recou opened 5 months ago
'check_nwc_health' '--mode' 'cpu-load' '--protocol' '2c' -vvvvvvvvvvv Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::Device::check_messages
Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Tue Feb 6 15:19:37 2024: cache: 1.3.6.1.2.1.1.3.0
Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Tue Feb 6 15:19:37 2024: GET: MIB-2-MIB::sysUpTime (1.3.6.1.2.1.1.3) : 34402836
Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::SNMPFRAMEWORKMIB
Tue Feb 6 15:19:37 2024: cache: 1.3.6.1.6.3.10.2.1.3.0
Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::SNMPFRAMEWORKMIB
Tue Feb 6 15:19:37 2024: GET: SNMP-FRAMEWORK-MIB::snmpEngineTime (1.3.6.1.6.3.10.2.1.3.0) : 344028
Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::HOSTRESOURCESMIB
Tue Feb 6 15:19:37 2024: cache: 1.3.6.1.2.1.25.1.1
Tue Feb 6 15:19:37 2024: GET: HOST-RESOURCES-MIB::hrSystemUptime (1.3.6.1.2.1.25.1.1) :
Tue Feb 6 15:19:37 2024: I am a Linux 3.10.0-1160.15.2cpx86_64 #1 SMP Sun Nov 12 09:27:02 IST 2023 x86_64
Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::SYNOPTICSROOTMIB Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB Tue Feb 6 15:19:37 2024: cache: 1.3.6.1.2.1.1.2.0 Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB Tue Feb 6 15:19:37 2024: GET: MIB-2-MIB::sysObjectID (1.3.6.1.2.1.1.2) : 1.3.6.1.4.1.2620.1.6.123.1.89 Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::JUNIPERMIB Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB Tue Feb 6 15:19:37 2024: cache: 1.3.6.1.2.1.1.2.0 Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB Tue Feb 6 15:19:37 2024: GET: MIB-2-MIB::sysObjectID (1.3.6.1.2.1.1.2) : 1.3.6.1.4.1.2620.1.6.123.1.89 Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::NETSCREENPRODUCTSMIB Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB Tue Feb 6 15:19:37 2024: cache: 1.3.6.1.2.1.1.2.0 Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB Tue Feb 6 15:19:37 2024: GET: MIB-2-MIB::sysObjectID (1.3.6.1.2.1.1.2) : 1.3.6.1.4.1.2620.1.6.123.1.89 Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::NETGEARMIB Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB Tue Feb 6 15:19:37 2024: cache: 1.3.6.1.2.1.1.2.0 Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB Tue Feb 6 15:19:37 2024: GET: MIB-2-MIB::sysObjectID (1.3.6.1.2.1.1.2) : 1.3.6.1.4.1.2620.1.6.123.1.89 Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::PANPRODUCTSMIB Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB Tue Feb 6 15:19:37 2024: cache: 1.3.6.1.2.1.1.2.0 Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB Tue Feb 6 15:19:37 2024: GET: MIB-2-MIB::sysObjectID (1.3.6.1.2.1.1.2) : 1.3.6.1.4.1.2620.1.6.123.1.89 Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::ARUBAWIREDCHASSISMIB Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB Tue Feb 6 15:19:37 2024: cache: 1.3.6.1.2.1.1.2.0 Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB Tue Feb 6 15:19:37 2024: GET: MIB-2-MIB::sysObjectID (1.3.6.1.2.1.1.2) : 1.3.6.1.4.1.2620.1.6.123.1.89 Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::HPICFCHASSIS Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB Tue Feb 6 15:19:37 2024: cache: 1.3.6.1.2.1.1.2.0 Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB Tue Feb 6 15:19:37 2024: GET: MIB-2-MIB::sysObjectID (1.3.6.1.2.1.1.2) : 1.3.6.1.4.1.2620.1.6.123.1.89 Tue Feb 6 15:19:37 2024: using CheckNwcHealth::CheckPoint Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::override_opt
Tue Feb 6 15:19:37 2024: AUTOLOAD Monitoring::GLPlugin::Commandline::override_opt
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::check_messages
Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::CHECKPOINTMIB
Tue Feb 6 15:19:37 2024: cache: 1.3.6.1.4.1.2620.1.16.13
Tue Feb 6 15:19:37 2024: GET: CHECKPOINT-MIB::vsxVsInstalled (1.3.6.1.4.1.2620.1.16.13) :
Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::CHECKPOINTMIB
Tue Feb 6 15:19:37 2024: cache: 1.3.6.1.4.1.2620.1.6.7.2.4
Tue Feb 6 15:19:37 2024: GET: CHECKPOINT-MIB::procUsage (1.3.6.1.4.1.2620.1.6.7.2.4) :
Tue Feb 6 15:19:37 2024: get_table returned 24 oids in 0s Tue Feb 6 15:19:37 2024: get_matching_oids $VAR1 = { '-columns' => [ '1.3.6.1.4.1.2620.1.6.7.5' ] };
Tue Feb 6 15:19:37 2024: get_matching_oids returns 24 from 33 oids
Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::CHECKPOINTMIB
Tue Feb 6 15:19:37 2024: get_snmp_table_objects default returns 4 entries
Tue Feb 6 15:19:37 2024: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::CHECKPOINTMIB
Tue Feb 6 15:19:37 2024: cache: 1.3.6.1.4.1.2620.1.6.7.2.5
Tue Feb 6 15:19:37 2024: GET: CHECKPOINT-MIB::procQueue (1.3.6.1.4.1.2620.1.6.7.2.5) :
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::Component::CpuSubsystem::set_thresholds
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::Component::CpuSubsystem::check_thresholds
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::Component::CpuSubsystem::add_perfdata
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::Component::CpuSubsystem::MultiProc::set_thresholds
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::Component::CpuSubsystem::MultiProc::check_thresholds
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::Component::CpuSubsystem::MultiProc::add_perfdata
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::Component::CpuSubsystem::MultiProc::set_thresholds
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::Component::CpuSubsystem::MultiProc::check_thresholds
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::Component::CpuSubsystem::MultiProc::add_perfdata
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::Component::CpuSubsystem::MultiProc::set_thresholds
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::Component::CpuSubsystem::MultiProc::check_thresholds
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::Component::CpuSubsystem::MultiProc::add_perfdata
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::Component::CpuSubsystem::MultiProc::set_thresholds
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::Component::CpuSubsystem::MultiProc::check_thresholds
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::Component::CpuSubsystem::MultiProc::add_perfdata
[CPUSUBSYSTEM] procNum: 4 procUsage: 1 info: checking cpu cores [MULTIPROC_1.0] multiProcIdleTime: 0 multiProcIndex: 1 multiProcInterrupts: 348799 multiProcSystemTime: 0 multiProcUsage: 100 multiProcUserTime: 0 info: cpu core 1 usage is 100.00%
[MULTIPROC_2.0] multiProcIdleTime: 100 multiProcIndex: 2 multiProcInterrupts: 348799 multiProcSystemTime: 0 multiProcUsage: 0 multiProcUserTime: 0 info: cpu core 2 usage is 0.00%
[MULTIPROC_3.0] multiProcIdleTime: 100 multiProcIndex: 3 multiProcInterrupts: 348799 multiProcSystemTime: 0 multiProcUsage: 0 multiProcUserTime: 0 info: cpu core 3 usage is 0.00%
[MULTIPROC_4.0] multiProcIdleTime: 100 multiProcIndex: 4 multiProcInterrupts: 348799 multiProcSystemTime: 0 multiProcUsage: 0 multiProcUserTime: 0 info: cpu core 4 usage is 0.00%
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::check_messages
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::check_messages
Tue Feb 6 15:19:37 2024: AUTOLOAD CheckNwcHealth::CheckPoint::Firewall1::nagios_exit
CRITICAL - cpu core 1 usage is 100.00%, cpu usage is 1.00%, cpu core 2 usage is 0.00%, cpu core 3 usage is 0.00%, cpu core 4 usage is 0.00% checking cpus cpu usage is 1.00% checking cpu cores cpu core 1 usage is 100.00% cpu core 2 usage is 0.00% cpu core 3 usage is 0.00%
You are not the only one. Can you put this code in a file mininwc.pl please?
#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long;
use Net::SNMP;
# Initialize a hash to store the parameters
my %gparams = ();
my %goparams = ();
my %params = ();
# Set fixed parameters
$params{'-version'} = 3;
$params{'-port'} = 161;
$params{'-domain'} = "udp";
$params{'-translate'} = [
-all => 0x0,
-nosuchobject => 1,
-nosuchinstance => 1,
-endofmibview => 1,
-unsigned => 1,
];
$params{'-timeout'} = 60;
my $contextname = undef;
# Get command line options
GetOptions(
'hostname=s' => \$goparams{'-hostname'},
'community=s' => \$goparams{'-community'},
'username=s' => \$goparams{'-username'},
'authpassword=s' => \$goparams{'-authpassword'},
'authprotocol=s' => \$goparams{'-authprotocol'},
'privpassword=s' => \$goparams{'-privpassword'},
'privprotocol=s' => \$goparams{'-privprotocol'},
'contextname=s' => \$contextname,
'port=i' => \$goparams{'-port'},
);
if ($params{'-version'} ne "3" or $goparams{"-community"}) {
$params{'-version'} = "2c";
$params{'-community'} = $goparams{"-community"};
} else {
foreach my $param (qw(username authpassword authprotocol privpassword privprotocol)) {
$params{'-'.$param} = $goparams{"-".$param} if $goparams{"-".$param};
}
if ($contextname) {
$gparams{'-contextname'} = $contextname;
}
}
$params{'-port'} = $goparams{"-port"} if $goparams{"-port"};
$params{'-hostname'} = $goparams{"-hostname"} if $goparams{"-hostname"};
use Data::Dumper;
$gparams{'-varbindlist'} = [
"1.3.6.1.4.1.2620.1.6.7.2.7.0",
];
foreach my $idx (1..16) {
push(@{$gparams{'-varbindlist'}}, "1.3.6.1.4.1.2620.1.6.7.5.1.5.".$idx.".0");
}
my ($session, $error) = Net::SNMP->session(%params);
if ($error) {
printf "error: %s\n", $error;
} else {
my $result = $session->get_request(%gparams);
printf "%s\n", Data::Dumper::Dumper($result);
}
then run it with perl mininwc.pl --hostname ... --community ... and post the output here.
This is the output:
`$VAR1 = { '1.3.6.1.4.1.2620.1.6.7.5.1.5.16.0' => 'noSuchInstance', '1.3.6.1.4.1.2620.1.6.7.5.1.5.11.0' => 'noSuchInstance', '1.3.6.1.4.1.2620.1.6.7.5.1.5.7.0' => 'noSuchInstance', '1.3.6.1.4.1.2620.1.6.7.5.1.5.12.0' => 'noSuchInstance', '1.3.6.1.4.1.2620.1.6.7.5.1.5.2.0' => 3, '1.3.6.1.4.1.2620.1.6.7.5.1.5.4.0' => 3, '1.3.6.1.4.1.2620.1.6.7.5.1.5.8.0' => 'noSuchInstance', '1.3.6.1.4.1.2620.1.6.7.5.1.5.13.0' => 'noSuchInstance', '1.3.6.1.4.1.2620.1.6.7.5.1.5.5.0' => 'noSuchInstance', '1.3.6.1.4.1.2620.1.6.7.2.7.0' => 4, '1.3.6.1.4.1.2620.1.6.7.5.1.5.6.0' => 'noSuchInstance', '1.3.6.1.4.1.2620.1.6.7.5.1.5.3.0' => 3, '1.3.6.1.4.1.2620.1.6.7.5.1.5.14.0' => 'noSuchInstance', '1.3.6.1.4.1.2620.1.6.7.5.1.5.10.0' => 'noSuchInstance', '1.3.6.1.4.1.2620.1.6.7.5.1.5.9.0' => 'noSuchInstance', '1.3.6.1.4.1.2620.1.6.7.5.1.5.1.0' => 0, '1.3.6.1.4.1.2620.1.6.7.5.1.5.15.0' => 'noSuchInstance' };
`
Ah, you have 4 cpus. So you can change foreach my $idx (1..16) { to foreach my $idx (1..4) { It's very strange that this mini-example shows plausible values, but the same oids in the context of the big perl plugin only shows 0, 25, 50, 100 and so on. (I already exchanged the snmpwalk in the plugin with single snmpgets and its the same) Can you run the above code (with 1..4) lets say 10 tines in a row like: for i in 1 2 3 4 5 6 7 8 9 10 do perl mininwc.pl --hostname ... --community ...; sleep 0.1; done and same but together with check_nwc_health for i in 1 2 3 4 5 6 7 8 9 10 do perl mininwc.pl --hostname ... --community ...; check_nwc_health --mode cpu-load --hostname ... --community ...; sleep 0.1; done I appreciate your help.
I'm glad to contribute. The problem is (as always) that the bug is non-persistance. But the mini-perl script gives the same result as check_nwc_health:
OK - cpu usage is 10.00%, cpu core 1 usage is 0.00%, cpu core 2 usage is 4.00%, cpu core 3 usage is 15.00%, cpu core 4 usage is 11.00% | 'cpu_usage'=10%;80;90;0;100 'cpu_core_1_usage'=0%;80;90;0;100 'cpu_core_2_usage'=4%;80;90;0;100 'cpu_core_3_usage'=15%;80;90;0;100 'cpu_core_4_usage'=11%;80;90;0;100 $VAR1 = { '1.3.6.1.4.1.2620.1.6.7.5.1.5.2.0' => 4, '1.3.6.1.4.1.2620.1.6.7.2.7.0' => 4, '1.3.6.1.4.1.2620.1.6.7.5.1.5.4.0' => 11, '1.3.6.1.4.1.2620.1.6.7.5.1.5.3.0' => 15, '1.3.6.1.4.1.2620.1.6.7.5.1.5.1.0' => 0 };
OK - cpu usage is 10.00%, cpu core 1 usage is 0.00%, cpu core 2 usage is 4.00%, cpu core 3 usage is 15.00%, cpu core 4 usage is 11.00% | 'cpu_usage'=10%;80;90;0;100 'cpu_core_1_usage'=0%;80;90;0;100 'cpu_core_2_usage'=4%;80;90;0;100 'cpu_core_3_usage'=15%;80;90;0;100 'cpu_core_4_usage'=11%;80;90;0;100 $VAR1 = { '1.3.6.1.4.1.2620.1.6.7.2.7.0' => 4, '1.3.6.1.4.1.2620.1.6.7.5.1.5.2.0' => 4, '1.3.6.1.4.1.2620.1.6.7.5.1.5.1.0' => 0, '1.3.6.1.4.1.2620.1.6.7.5.1.5.4.0' => 11, '1.3.6.1.4.1.2620.1.6.7.5.1.5.3.0' => 15 };
CRITICAL - cpu core 1 usage is 100.00%, cpu core 3 usage is 100.00%, cpu core 4 usage is 100.00%, cpu usage is 3.00%, cpu core 2 usage is 0.00% | 'cpu_usage'=3%;80;90;0;100 'cpu_core_1_usage'=100%;80;90;0;100 'cpu_core_2_usage'=0%;80;90;0;100 'cpu_core_3_usage'=100%;80;90;0;100 'cpu_core_4_usage'=100%;80;90;0;100 $VAR1 = { '1.3.6.1.4.1.2620.1.6.7.5.1.5.1.0' => 100, '1.3.6.1.4.1.2620.1.6.7.5.1.5.3.0' => 100, '1.3.6.1.4.1.2620.1.6.7.5.1.5.2.0' => 0, '1.3.6.1.4.1.2620.1.6.7.2.7.0' => 4, '1.3.6.1.4.1.2620.1.6.7.5.1.5.4.0' => 100 };
CRITICAL - cpu core 1 usage is 100.00%, cpu core 3 usage is 100.00%, cpu core 4 usage is 100.00%, cpu usage is 3.00%, cpu core 2 usage is 0.00% | 'cpu_usage'=3%;80;90;0;100 'cpu_core_1_usage'=100%;80;90;0;100 'cpu_core_2_usage'=0%;80;90;0;100 'cpu_core_3_usage'=100%;80;90;0;100 'cpu_core_4_usage'=100%;80;90;0;100 $VAR1 = { '1.3.6.1.4.1.2620.1.6.7.5.1.5.3.0' => 100, '1.3.6.1.4.1.2620.1.6.7.2.7.0' => 4, '1.3.6.1.4.1.2620.1.6.7.5.1.5.4.0' => 100, '1.3.6.1.4.1.2620.1.6.7.5.1.5.2.0' => 0, '1.3.6.1.4.1.2620.1.6.7.5.1.5.1.0' => 100 };
CRITICAL - cpu core 1 usage is 100.00%, cpu core 3 usage is 100.00%, cpu core 4 usage is 100.00%, cpu usage is 3.00%, cpu core 2 usage is 0.00% | 'cpu_usage'=3%;80;90;0;100 'cpu_core_1_usage'=100%;80;90;0;100 'cpu_core_2_usage'=0%;80;90;0;100 'cpu_core_3_usage'=100%;80;90;0;100 'cpu_core_4_usage'=100%;80;90;0;100 $VAR1 = { '1.3.6.1.4.1.2620.1.6.7.5.1.5.1.0' => 100, '1.3.6.1.4.1.2620.1.6.7.5.1.5.4.0' => 100, '1.3.6.1.4.1.2620.1.6.7.5.1.5.2.0' => 0, '1.3.6.1.4.1.2620.1.6.7.5.1.5.3.0' => 100, '1.3.6.1.4.1.2620.1.6.7.2.7.0' => 4 };
CRITICAL - cpu core 1 usage is 100.00%, cpu core 3 usage is 100.00%, cpu core 4 usage is 100.00%, cpu usage is 3.00%, cpu core 2 usage is 0.00% | 'cpu_usage'=3%;80;90;0;100 'cpu_core_1_usage'=100%;80;90;0;100 'cpu_core_2_usage'=0%;80;90;0;100 'cpu_core_3_usage'=100%;80;90;0;100 'cpu_core_4_usage'=100%;80;90;0;100 $VAR1 = { '1.3.6.1.4.1.2620.1.6.7.5.1.5.1.0' => 0, '1.3.6.1.4.1.2620.1.6.7.5.1.5.3.0' => 4, '1.3.6.1.4.1.2620.1.6.7.5.1.5.4.0' => 7, '1.3.6.1.4.1.2620.1.6.7.5.1.5.2.0' => 5, '1.3.6.1.4.1.2620.1.6.7.2.7.0' => 4 };
When you let it run 100 times, can you estimate if there are more good results than bad results? And for how many cycles the bad result is returned before there are usable metrics? If there is a pattern somehow then i cantry to add a loop in check_nwc_health which repeats the query until there are non-100 results.
There doesn't seems to be an obvious pattern:
Except for 10:29 it all looks like crap. On the command line there was at least a 50:50 ratio.
#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long;
use Net::SNMP;
# Initialize a hash to store the parameters
my %gparams = ();
my %goparams = ();
my %params = ();
# Set fixed parameters
$params{'-version'} = 3;
$params{'-port'} = 161;
$params{'-domain'} = "udp";
$params{'-translate'} = [
-all => 0x0,
-nosuchobject => 1,
-nosuchinstance => 1,
-endofmibview => 1,
-unsigned => 1,
];
$params{'-timeout'} = 60;
my $contextname = undef;
# Get command line options
GetOptions(
'hostname=s' => \$goparams{'-hostname'},
'community=s' => \$goparams{'-community'},
'username=s' => \$goparams{'-username'},
'authpassword=s' => \$goparams{'-authpassword'},
'authprotocol=s' => \$goparams{'-authprotocol'},
'privpassword=s' => \$goparams{'-privpassword'},
'privprotocol=s' => \$goparams{'-privprotocol'},
'contextname=s' => \$contextname,
'port=i' => \$goparams{'-port'},
);
if ($params{'-version'} ne "3" or $goparams{"-community"}) {
$params{'-version'} = "2c";
$params{'-community'} = $goparams{"-community"};
} else {
foreach my $param (qw(username authpassword authprotocol privpassword privprotocol)) {
$params{'-'.$param} = $goparams{"-".$param} if $goparams{"-".$param};
}
if ($contextname) {
$gparams{'-contextname'} = $contextname;
}
}
$params{'-port'} = $goparams{"-port"} if $goparams{"-port"};
$params{'-hostname'} = $goparams{"-hostname"} if $goparams{"-hostname"};
my ($session, $error) = Net::SNMP->session(%params);
if ($error) {
printf "error: %s\n", $error;
} else {
$gparams{'-varbindlist'} = [
"1.3.6.1.4.1.2620.1.6.7.2.7.0",
];
my $result = $session->get_request(%gparams);
my $num_cpus = $result->{"1.3.6.1.4.1.2620.1.6.7.2.7.0"};
printf "found %d cpus\n", $num_cpus;
# multiProcUsage
%gparams = ( "-columns" => [ "1.3.6.1.4.1.2620.1.6.7.5.1.5", ]);
$result = $session->get_entries(%gparams);
my @mpus = map { $result->{$_} } sort { (split /\./, $a)[-2] <=> (split /\./, $b)[-2] } grep { /^1\.3\.6\.1\.4\.1\.2620\.1\.6\.7\.5\.1\.5\./ } keys %$result;
# hrProcessorLoad
%gparams = ( "-columns" => [ "1.3.6.1.2.1.25.3.3.1.2", ]);
$result = $session->get_entries(%gparams);
my @hpls = map { $result->{$_} } sort { (split /\./, $a)[-1] <=> (split /\./, $b)[-1] } grep { /^1\.3\.6\.1\.2\.1\.25\.3\.3\.1\.2\./ } keys %$result;
printf "multiProcUsage %s\n", join(" ", map { sprintf "%03d", $_; } @mpus);
printf "hrProcessorLoad %s\n", join(" ", map { sprintf "%03d", $_; } @hpls);
}
There are also cpu load metrics from the HOST-RESOURCES-MIB. Let's compare them. Can you replace the old mininwc.pl with the above code? And run it in a loop with short intervals please.
Except for 10:29 it all looks like crap. On the command line there was at least a 50:50 ratio.
That was because i've filtered the output for a loop of 100 iterations.
#!/usr/bin/perl use strict; use warnings; use Getopt::Long; use Net::SNMP; # Initialize a hash to store the parameters my %gparams = (); my %goparams = (); my %params = (); # Set fixed parameters $params{'-version'} = 3; $params{'-port'} = 161; $params{'-domain'} = "udp"; $params{'-translate'} = [ -all => 0x0, -nosuchobject => 1, -nosuchinstance => 1, -endofmibview => 1, -unsigned => 1, ]; $params{'-timeout'} = 60; my $contextname = undef; # Get command line options GetOptions( 'hostname=s' => \$goparams{'-hostname'}, 'community=s' => \$goparams{'-community'}, 'username=s' => \$goparams{'-username'}, 'authpassword=s' => \$goparams{'-authpassword'}, 'authprotocol=s' => \$goparams{'-authprotocol'}, 'privpassword=s' => \$goparams{'-privpassword'}, 'privprotocol=s' => \$goparams{'-privprotocol'}, 'contextname=s' => \$contextname, 'port=i' => \$goparams{'-port'}, ); if ($params{'-version'} ne "3" or $goparams{"-community"}) { $params{'-version'} = "2c"; $params{'-community'} = $goparams{"-community"}; } else { foreach my $param (qw(username authpassword authprotocol privpassword privprotocol)) { $params{'-'.$param} = $goparams{"-".$param} if $goparams{"-".$param}; } if ($contextname) { $gparams{'-contextname'} = $contextname; } } $params{'-port'} = $goparams{"-port"} if $goparams{"-port"}; $params{'-hostname'} = $goparams{"-hostname"} if $goparams{"-hostname"}; my ($session, $error) = Net::SNMP->session(%params); if ($error) { printf "error: %s\n", $error; } else { $gparams{'-varbindlist'} = [ "1.3.6.1.4.1.2620.1.6.7.2.7.0", ]; my $result = $session->get_request(%gparams); my $num_cpus = $result->{"1.3.6.1.4.1.2620.1.6.7.2.7.0"}; printf "found %d cpus\n", $num_cpus; # multiProcUsage %gparams = ( "-columns" => [ "1.3.6.1.4.1.2620.1.6.7.5.1.5", ]); $result = $session->get_entries(%gparams); my @mpus = map { $result->{$_} } sort { (split /\./, $a)[-2] <=> (split /\./, $b)[-2] } grep { /^1\.3\.6\.1\.4\.1\.2620\.1\.6\.7\.5\.1\.5\./ } keys %$result; # hrProcessorLoad %gparams = ( "-columns" => [ "1.3.6.1.2.1.25.3.3.1.2", ]); $result = $session->get_entries(%gparams); my @hpls = map { $result->{$_} } sort { (split /\./, $a)[-1] <=> (split /\./, $b)[-1] } grep { /^1\.3\.6\.1\.2\.1\.25\.3\.3\.1\.2\./ } keys %$result; printf "multiProcUsage %s\n", join(" ", map { sprintf "%03d", $_; } @mpus); printf "hrProcessorLoad %s\n", join(" ", map { sprintf "%03d", $_; } @hpls); }
There are also cpu load metrics from the HOST-RESOURCES-MIB. Let's compare them. Can you replace the old mininwc.pl with the above code? And run it in a loop with short intervals please.
looks the same
found 4 cpus multiProcUsage 000 004 004 006 hrProcessorLoad 001 003 003 003 OK - cpu usage is 3.00%, cpu core 1 usage is 0.00%, cpu core 2 usage is 4.00%, cpu core 3 usage is 4.00%, cpu core 4 usage is 6.00% | 'cpu_usage'=3%;80;90;0;100 'cpu_core_1_usage'=0%;80;90;0;100 'cpu_core_2_usage'=4%;80;90;0;100 'cpu_core_3_usage'=4%;80;90;0;100 'cpu_core_4_usage'=6%;80;90;0;100 found 4 cpus multiProcUsage 000 007 009 007 hrProcessorLoad 001 003 003 003 OK - cpu usage is 7.00%, cpu core 1 usage is 0.00%, cpu core 2 usage is 7.00%, cpu core 3 usage is 9.00%, cpu core 4 usage is 7.00% | 'cpu_usage'=7%;80;90;0;100 'cpu_core_1_usage'=0%;80;90;0;100 'cpu_core_2_usage'=7%;80;90;0;100 'cpu_core_3_usage'=9%;80;90;0;100 'cpu_core_4_usage'=7%;80;90;0;100 found 4 cpus multiProcUsage 000 007 009 007 hrProcessorLoad 001 003 004 004 OK - cpu usage is 7.00%, cpu core 1 usage is 0.00%, cpu core 2 usage is 7.00%, cpu core 3 usage is 9.00%, cpu core 4 usage is 7.00% | 'cpu_usage'=7%;80;90;0;100 'cpu_core_1_usage'=0%;80;90;0;100 'cpu_core_2_usage'=7%;80;90;0;100 'cpu_core_3_usage'=9%;80;90;0;100 'cpu_core_4_usage'=7%;80;90;0;100 found 4 cpus multiProcUsage 000 007 009 007 hrProcessorLoad 001 003 004 004 OK - cpu usage is 7.00%, cpu core 1 usage is 0.00%, cpu core 2 usage is 7.00%, cpu core 3 usage is 9.00%, cpu core 4 usage is 7.00% | 'cpu_usage'=7%;80;90;0;100 'cpu_core_1_usage'=0%;80;90;0;100 'cpu_core_2_usage'=7%;80;90;0;100 'cpu_core_3_usage'=9%;80;90;0;100 'cpu_core_4_usage'=7%;80;90;0;100 found 4 cpus multiProcUsage 000 007 009 007 hrProcessorLoad 001 003 004 004 CRITICAL - cpu core 1 usage is 100.00%, cpu usage is 2.00%, cpu core 2 usage is 0.00%, cpu core 3 usage is 0.00%, cpu core 4 usage is 0.00% | 'cpu_usage'=2%;80;90;0;100 'cpu_core_1_usage'=100%;80;90;0;100 'cpu_core_2_usage'=0%;80;90;0;100 'cpu_core_3_usage'=0%;80;90;0;100 'cpu_core_4_usage'=0%;80;90;0;100 found 4 cpus multiProcUsage 100 000 000 000 hrProcessorLoad 001 003 004 004 CRITICAL - cpu core 1 usage is 100.00%, cpu usage is 2.00%, cpu core 2 usage is 0.00%, cpu core 3 usage is 0.00%, cpu core 4 usage is 0.00% | 'cpu_usage'=2%;80;90;0;100 'cpu_core_1_usage'=100%;80;90;0;100 'cpu_core_2_usage'=0%;80;90;0;100 'cpu_core_3_usage'=0%;80;90;0;100 'cpu_core_4_usage'=0%;80;90;0;100 found 4 cpus multiProcUsage 100 000 000 000 hrProcessorLoad 001 003 004 004 CRITICAL - cpu core 1 usage is 100.00%, cpu usage is 2.00%, cpu core 2 usage is 0.00%, cpu core 3 usage is 0.00%, cpu core 4 usage is 0.00% | 'cpu_usage'=2%;80;90;0;100 'cpu_core_1_usage'=100%;80;90;0;100 'cpu_core_2_usage'=0%;80;90;0;100 'cpu_core_3_usage'=0%;80;90;0;100 'cpu_core_4_usage'=0%;80;90;0;100 found 4 cpus multiProcUsage 100 000 000 000 hrProcessorLoad 001 003 004 004 CRITICAL - cpu core 1 usage is 100.00%, cpu core 3 usage is 100.00%, cpu usage is 3.00%, cpu core 2 usage is 0.00%, cpu core 4 usage is 0.00% | 'cpu_usage'=3%;80;90;0;100 'cpu_core_1_usage'=100%;80;90;0;100 'cpu_core_2_usage'=0%;80;90;0;100 'cpu_core_3_usage'=100%;80;90;0;100 'cpu_core_4_usage'=0%;80;90;0;100 found 4 cpus multiProcUsage 100 000 100 000 hrProcessorLoad 001 003 004 004 CRITICAL - cpu core 1 usage is 100.00%, cpu core 3 usage is 100.00%, cpu usage is 3.00%, cpu core 2 usage is 0.00%, cpu core 4 usage is 0.00% | 'cpu_usage'=3%;80;90;0;100 'cpu_core_1_usage'=100%;80;90;0;100 'cpu_core_2_usage'=0%;80;90;0;100 'cpu_core_3_usage'=100%;80;90;0;100 'cpu_core_4_usage'=0%;80;90;0;100 found 4 cpus multiProcUsage 100 000 100 000 hrProcessorLoad 001 003 004 004 CRITICAL - cpu core 1 usage is 100.00%, cpu core 3 usage is 100.00%, cpu usage is 3.00%, cpu core 2 usage is 0.00%, cpu core 4 usage is 0.00% | 'cpu_usage'=3%;80;90;0;100 'cpu_core_1_usage'=100%;80;90;0;100 'cpu_core_2_usage'=0%;80;90;0;100 'cpu_core_3_usage'=100%;80;90;0;100 'cpu_core_4_usage'=0%;80;90;0;100 found 4 cpus multiProcUsage 100 000 100 000 hrProcessorLoad 001 004 004 004 CRITICAL - cpu core 1 usage is 100.00%, cpu core 2 usage is 100.00%, cpu core 3 usage is 100.00%, cpu usage is 2.00%, cpu core 4 usage is 0.00% | 'cpu_usage'=2%;80;90;0;100 'cpu_core_1_usage'=100%;80;90;0;100 'cpu_core_2_usage'=100%;80;90;0;100 'cpu_core_3_usage'=100%;80;90;0;100 'cpu_core_4_usage'=0%;80;90;0;100
Is there a way to check the validity of the hrProcessorLoad with a command line tool on a console?
It's hard to say as the CPU load is very low (<5%) on the device and hrProcessorLoad shows the average over the last min. I'll reach out to the firewall colleagues to create some load.
Even if the hrProcessorLoad are not identical to the multiProcUsage (in the cases when these are not [0, 20, 50, 100]), it would be sufficient when they just are approximately high and low to what the cli tool shows. In this case i would check if we are on R81.20 and then completely ignore the (obviously broken multiProcTable) and fallback to hrProcessorLoad.
I did some load test on all 4 cores. hrProcessorLoad works as designed with the delay so it may provide a suitable workaround.
found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 025 026 025 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 034 035 034 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 034 035 034 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 034 035 034 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 034 035 034 found 4 cpus multiProcUsage 001 100 100 100 hrProcessorLoad 001 042 044 043 found 4 cpus multiProcUsage 001 100 100 100 hrProcessorLoad 001 042 044 043 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 042 044 043 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 042 044 043 found 4 cpus multiProcUsage 001 100 100 100 hrProcessorLoad 001 042 044 043 found 4 cpus multiProcUsage 001 100 100 100 hrProcessorLoad 001 051 052 052 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 051 052 052 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 051 052 052 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 051 052 052 found 4 cpus multiProcUsage 002 100 100 100 hrProcessorLoad 001 051 052 052 found 4 cpus multiProcUsage 002 100 100 100 hrProcessorLoad 001 060 061 061 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 060 061 061 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 060 061 061 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 060 061 061 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 069 070 069 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 069 070 069 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 069 070 069 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 069 070 069 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 069 070 069 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 078 078 078 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 078 078 078 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 078 078 078 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 078 078 078 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 087 087 087 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 087 087 087 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 087 087 087 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 087 087 087 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 087 087 087 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 096 096 096 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 096 096 096 found 4 cpus multiProcUsage 001 100 100 100 hrProcessorLoad 001 096 096 096 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 096 096 096 found 4 cpus multiProcUsage 000 100 100 100 hrProcessorLoad 001 096 096 096 found 4 cpus multiProcUsage 000 056 055 054 hrProcessorLoad 001 100 100 100 found 4 cpus multiProcUsage 000 056 055 054 hrProcessorLoad 001 100 100 100 found 4 cpus multiProcUsage 000 001 003 001 hrProcessorLoad 001 100 100 100 found 4 cpus multiProcUsage 000 001 003 001 hrProcessorLoad 001 100 100 100 found 4 cpus multiProcUsage 000 001 003 001 hrProcessorLoad 001 091 091 091 found 4 cpus multiProcUsage 000 001 003 001 hrProcessorLoad 001 091 091 091 found 4 cpus multiProcUsage 000 000 003 003 hrProcessorLoad 001 091 091 091
Is R81.20 the latest version? I googled around for "R81.20 cpu" and found a lot of people complaining about the general quality of checkpoint and also some cpu-related posts, but non of them affected snmp. If there is an update or hotfix (or whatever it's called) i would be very interested if the metrics are better then.
Yes. 81.20 is the latest version. We've currently deploying the latest hotfix/jumbo fix and it's still present. But i'll try future versions with your first mininwc.pl script.
Can you send me an email, so i can attach a plugin for testing to my reply? I can't attach it here. gerhard.lausser@consol.de
with the latest update for Checkpoint R81.20 (Take 41) check_nwc_health shows not the correct values:
'/usr/lib64/nagios/plugins/contrib/check_nwc_health' '--mode' 'cpu-load' '--protocol' '2c' -v CRITICAL - cpu core 1 usage is 100.00%, cpu core 2 usage is 100.00%, cpu core 3 usage is 100.00%, cpu usage is 1.00%, cpu core 4 usage is 0.00% checking cpus cpu usage is 1.00% checking cpu cores cpu core 1 usage is 100.00% cpu core 2 usage is 100.00% cpu core 3 usage is 100.00% cpu core 4 usage is 0.00% | 'cpu_usage'=1%;80;90;0;100 'cpu_core_1_usage'=100%;80;90;0;100 'cpu_core_2_usage'=100%;80;90;0;100 'cpu_core_3_usage'=100%;80;90;0;100 'cpu_core_4_usage'=0%;80;90;0;100
snmpwalk -v 2c .1.3.6.1.4.1.2620.1.6.7.5.1.5 SNMPv2-SMI::enterprises.2620.1.6.7.5.1.5.1.0 = Gauge32: 0 SNMPv2-SMI::enterprises.2620.1.6.7.5.1.5.2.0 = Gauge32: 6 SNMPv2-SMI::enterprises.2620.1.6.7.5.1.5.3.0 = Gauge32: 10 SNMPv2-SMI::enterprises.2620.1.6.7.5.1.5.4.0 = Gauge32: 9
top shows the same values as snmpwalk and the OID are the same as on older versions.