Open Napsty opened 2 months ago
The plan to solve the bug involves verifying and correcting the parsing and handling of the criticalthreshold
option within the check_logfiles.pl
script and ensuring that the logic for applying this threshold is correctly implemented in the Nagios::CheckLogfiles
module. The issue appears to be that the criticalthreshold
is either not being parsed correctly or not being applied correctly, leading to a CRITICAL status being reported even when the number of error lines is below the threshold.
The bug is likely caused by either incorrect parsing of the criticalthreshold
option from the configuration file or flawed logic in the Nagios::CheckLogfiles
module that handles this threshold. The criticalthreshold
option should dictate the number of error lines required to trigger a CRITICAL status, but it seems that this threshold is not being respected, causing premature CRITICAL alerts.
To address this issue, we need to:
criticalthreshold
option is correctly parsed from the configuration file.criticalthreshold
is correctly applied in the Nagios::CheckLogfiles
module.criticalthreshold
logic.Ensure that the criticalthreshold
option is correctly parsed and passed to the Nagios::CheckLogfiles
object in check_logfiles.pl
.
# check_logfiles.pl
# Add debugging statement after parsing command-line options
print STDERR "Parsed criticalthreshold: $commandline{criticalthreshold}\n" if exists $commandline{criticalthreshold};
# Ensure criticalthreshold is included in the options passed to Nagios::CheckLogfiles
if (my $cl = Nagios::CheckLogfiles->new({
...
options => join(',', grep { $_ }
...
$commandline{criticalthreshold} ? "criticalthreshold=".$commandline{criticalthreshold} : undef,
...
),
...
})) {
...
}
Review and correct the logic within the Nagios::CheckLogfiles
module to ensure that the criticalthreshold
is correctly applied.
# Nagios/CheckLogfiles.pm
# Add debugging statement to trace the application of criticalthreshold
sub check_thresholds {
my ($self, $count) = @_;
print STDERR "Checking thresholds with count: $count and criticalthreshold: $self->{criticalthreshold}\n";
if ($count >= $self->{criticalthreshold}) {
return 'CRITICAL';
} elsif ($count >= $self->{warningthreshold}) {
return 'WARNING';
} else {
return 'OK';
}
}
Add more detailed debugging statements to trace the internal states and threshold counts more precisely.
# Nagios/CheckLogfiles.pm
# Add debugging statements around threshold checks
sub analyze_logfile {
my ($self, $logfile) = @_;
my $count = 0;
while (my $line = <$logfile>) {
if ($line =~ /$self->{criticalpattern}/) {
$count++;
}
}
print STDERR "Total critical pattern matches: $count\n";
return $self->check_thresholds($count);
}
To replicate the bug, follow these steps:
criticalthreshold
option set to a specific value (e.g., 10).check_logfiles
plugin with the configuration file and a log file containing fewer error lines than the criticalthreshold
.Example configuration file (logfile_icinga.cfg
):
$seekfilesdir = '/var/tmp/check_logfiles';
$protocolsdir = '/var/tmp/check_logfiles';
$scriptpath = '/usr/lib64/nagios/plugins';
@searches = (
{
tag => 'icinga2_client_handshake_errors',
logfile => '/var/log/icinga2/icinga2.log',
criticalpatterns => [
'Client TLS handshake failed'
],
options => 'noprotocol,nosticky,nosavethresholdcount,nosavestate,criticalthreshold=10,warningthreshold=5,maxage=15m',
}
);
Command to run the plugin:
'/usr/bin/sudo' '/usr/lib64/nagios/plugins/check_logfiles' '--config' '/etc/nagios/logfile_icinga.cfg' '--tag' 'icinga2_client_handshake_errors'
By following these steps, you should be able to observe the bug and verify that the solution correctly addresses the issue.
Click here to create a Pull Request with the proposed solution
Files used for this task:
Threshold option works correctly. The log file actually contains a large number of occurrences of found patterns.
The resulting output "3 errors" means actually at least 3 x 10 (threshold) errors detected.
A verbose run shows all the matched patterns and also shows which event is accounted for (count this match
):
[root@linux PROD ~]# /usr/lib64/nagios/plugins/check_logfiles -f /etc/nagios/logfile_icinga.cfg --tag=icinga2_client_handshake_errors -v
Fri Sep 20 15:47:45 2024: ==================== /var/log/icinga2/icinga2.log ==================
Fri Sep 20 15:47:45 2024: found seekfile /var/tmp/check_logfiles/logfile_icinga._var_log_icinga2_icinga2.log.icinga2_client_handshake_errors
Fri Sep 20 15:47:45 2024: LS lastlogfile = /var/log/icinga2/icinga2.log
Fri Sep 20 15:47:45 2024: LS lastoffset = 163782719 / lasttime = 1726839698 (Fri Sep 20 15:41:38 2024) / inode = 64781:23
Fri Sep 20 15:47:45 2024: found private state $VAR1 = {
'lastruntime' => 1726839643,
'runcount' => 61099,
'matchingpattern' => 'Client TLS handshake failed',
'logfile' => '/var/log/icinga2/icinga2.log'
};
Fri Sep 20 15:47:45 2024: the logfile grew to 164879277
Fri Sep 20 15:47:45 2024: opened logfile /var/log/icinga2/icinga2.log
Fri Sep 20 15:47:45 2024: logfile /var/log/icinga2/icinga2.log (modified Fri Sep 20 15:47:43 2024 / accessed Fri Sep 20 15:41:41 2024 / inode 23 / inode changed Fri Sep 20 15:47:43 2024)
Fri Sep 20 15:47:45 2024: relevant files: icinga2.log
Fri Sep 20 15:47:45 2024: moving to position 163782719 in /var/log/icinga2/icinga2.log
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:41:48 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 9
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:41:58 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 8
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:42:08 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 7
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:42:18 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 6
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:42:28 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 5
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:42:38 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 4
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:42:49 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 3
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:43:08 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 2
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:43:18 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 1
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:43:28 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: count this match
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:43:38 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 9
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:43:48 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 8
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:43:58 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 7
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:44:08 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 6
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:44:18 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 5
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:44:29 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 4
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:44:39 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 3
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:44:49 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 2
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:44:59 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 1
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:45:09 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: count this match
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:45:19 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 9
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:45:29 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 8
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:45:39 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 7
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:45:49 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 6
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:45:59 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 5
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:46:09 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 4
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:46:19 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 3
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:46:29 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 2
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:46:39 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 1
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:46:49 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: count this match
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:46:59 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 9
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:47:09 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 8
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:47:19 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 7
Fri Sep 20 15:47:45 2024: MATCH CRITICAL Client TLS handshake failed with [2024-09-20 15:47:30 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled
Fri Sep 20 15:47:45 2024: skip match and the next 6
Fri Sep 20 15:47:45 2024: stopped reading at position 164879277
Fri Sep 20 15:47:45 2024: keeping position 164879277 and time 1726840063 (Fri Sep 20 15:47:43 2024) for inode 64781:23 in mind
CRITICAL - (3 errors) - [2024-09-20 15:46:49 +0200] critical/ApiListener: Client TLS handshake failed (to [10.50.60.70]:5665): Operation canceled ...|'icinga2_client_handshake_errors_lines'=7560 'icinga2_client_handshake_errors_warnings'=0 'icinga2_client_handshake_errors_criticals'=3 'icinga2_client_handshake_errors_unknowns'=0
Is there a way that the output shows the actual number of errors (34) instead of 3?
Seen a weird problem today where
check_logfiles
correctly identifies error patterns in a log file. The config file sets multiple options, includingcriticalthreshold=10
, yet the plugin reports a CRITICAL status when finding a number of error lines below the threshold.Config file:
Command line usage would be:
'/usr/bin/sudo' '/usr/lib64/nagios/plugins/check_logfiles' '--config' '/etc/nagios/logfile_icinga.cfg' '--tag' 'icinga2_client_handshake_errors'
.The Icinga2 alert history shows that the status of this service check switches to critical already after finding just a single error line within the run.
To my understanding this should only be the case if 10 or more error lines were found for this run? Or am I misunderstanding something or potentially breaking things with one of the other options?