Closed UrBnW closed 3 years ago
Really good news, so glad, many thanks @lausser π
Tests are failing
I did not look at the tests, you're right. For sure the sticky ones then need to be updated. Let me check and correct.
Please see #62 for a fix π
Oh, i just wanted to start myself. Let me check (because some tests require windows modules etc. and might fail in any case)
Arrgh, my wife is calling. I wish i would have at least during my holidays a few minutes...
No problem, let's stay tuned π Thank you !
Now i know why only the last error was counted and visible. With a sticky=600
an error1 is written
t+0m check_logfiles finds error1 -> critical "error1"
t+5m check_logfiles remembers sticky error1 -> "error1"
an error2 is written
t+10m check_logfiles finds error2, remembers error1 -> critical "error2"
t+15m check_logfiles remembers error2, remembers error1, critical "error2"
error1 meanwhile has expired. But as there is no timestamp for the single matches and they don't expire individually. You hist don't see error1 anymore in the output.
Now when i take your changes, the first error will always be counted, even if it has long expired. If there is a new error shortly before the sticky=
Now when i take your changes, the first error will always be counted, even if it has long expired. If there is a new error shortly before the sticky=, the old errors still are remembered.
Yes, the errors' counter computes / gives a view of all errors caught since last normal / OK situation.
I think it rather makes sense.
sticky=600
will allow your monitoring service to count all errors since last OK status.
Note that we only remember the last caught line's output, not all caught lines' output, so that the temporary / working file remains light, no need to duplicate and output hundreds / thousands of lines.
I will try to add timestamps to the matchlines array, so older matchlines can be dropped after a while, even when a new error line prolonges the stickyness.
Could do the trick yes if you want to have the counters giving the number of lines caught during the last X
sticky seconds, if you think it's more relevant than the current situation (since last OK status).
We could add timestamps in the matchcounters
array, line 2692
:
$self->{newstate}->{matchcounters}->{$level}->{time()} = scalar(@{$self->{matchlines}->{$level}});
And delete the too old timestamps from $self->{laststate}->{matchcounters}->{$level}
right after.
Note that #62 corrects tests in current situation, and also a small bug in CheckLogfiles.pm
which did not trigger until I worked on tests.
check_logfiles-4.0.tar.gz Every match now gets it's own timestamp. Sticky matches will expire individually, so that the numbers of warnigns/criticals shown constantly change.
Just tested your 4.0 archive, and it works as you proposed above ; the counters now give the number of lines caught during the last X
sticky seconds.
One drawback though. The temporary working file caches the content of every single line caught. Depending on the situation, we could catch hundreds / thousands of lines, leading to a very big temporary file, something we certainly don't want, for various reasons (space, performances...). Seems to be counterproductive, especially given the fact that we only output the last caught line...
This is why, in this PR, I use a matchcounters
array which only stores the counters, and get rid of the big matchlines
array :
# cat check_logfiles.test.seek
$state = {
'servicestateid' => 2,
'tag' => 'default',
'logtime' => 1610145857,
'logoffset' => 48,
'runtime' => 1610145857,
'laststicked' => 1610145857,
'serviceoutput' => 'error',
'runcount' => 4,
'matchcounters' => {
'WARNING' => 0,
'CRITICAL' => 8,
'UNKNOWN' => 0,
'OK' => 0
},
'devino' => '64512:524039',
'privatestate' => {
'logfile' => 'test',
'matchingpattern' => 'error',
'runcount' => 4,
'lastruntime' => 1610145855
}
};
1;
And, as proposed above, we could go further adding a timestamp field in this array, which would then look like this :
'matchcounters' => {
'CRITICAL' => {
'1610146819' => 4,
'1610146850' => 3
},
'WARNING' => {
'1610146819' => 2
},
'UNKNOWN' => {
'1610146850' => 7
},
'OK' => {
}
},
Seems more efficient, don't you think so ? If you're OK with this general idea, I can submit a PR over master branch to complete the current work.
Thank you again :+1:
Saving only the lΓ€Γt Match is in progressAm 09.01.2021 00:09 schrieb UrBnW notifications@github.com:
Just tested your 4.0 archive, and it works as you proposed above ; the counters now give the number of lines caught during the last X sticky seconds.
One drawback though.
The temporary working file caches the content of every single line caught.
Depending on the situation, we could catch hundreds / thousands of lines, leading to a very big temporary file, something we certainly don't want, for various reasons (space, performances...).
Seems to be counterproductive, especially given the fact that we only output the last caught line...
This is why, in this PR, I use a matchcounters array which only stores the counters, and get rid of the big matchlines array :
$state = {
'servicestateid' => 2,
'tag' => 'default',
'logtime' => 1610145857,
'logoffset' => 48,
'runtime' => 1610145857,
'laststicked' => 1610145857,
'serviceoutput' => 'error',
'runcount' => 4,
'matchcounters' => {
'WARNING' => 0,
'CRITICAL' => 8,
'UNKNOWN' => 0,
'OK' => 0
},
'devino' => '64512:524039',
'privatestate' => {
'logfile' => 'test',
'matchingpattern' => 'error',
'runcount' => 4,
'lastruntime' => 1610145855
}
};
1;
And, as proposed above, we could go further adding a timestamp field in this array, which would then look like this :
'matchcounters' => {
'CRITICAL' => {
'1610146819' => 4,
'1610146850' => 3
},
'WARNING' => {
'1610146819' => 2
},
'UNKNOWN' => {
'1610146850' => 7
},
'OK' => {
}
},
Seems more efficient, don't you think so ?
If you're OK with this general idea, I can submit a PR over master branch to complete the current work.
Thank you again π
βYou are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe.
The numbers of all mathces are saved, but only the text of the last match.
Just tested, that's perfect π
Perhaps we could go for the following little patch, so that empty $level."count"
elements remain in the array from one run to another (simply for a more consistent temporary file) :
@@ -2650,6 +2650,7 @@
if ($self->get_option('sticky')) {
$self->{matchlines}->{$level} = [];
if ($self->get_option("report") eq "short") {
+ $self->{matchlines}->{$level."count"} = {};
if (exists $self->{laststate}->{matchlines}->{$level."count"}) {
my @seconds = keys %{$self->{laststate}->{matchlines}->{$level."count"}};
foreach my $second (@seconds) {
@@ -2661,8 +2662,6 @@
$self->{laststate}->{matchlines}->{$level."count"}->{$second}, $level, scalar localtime $second);
}
}
- } else {
- $self->{matchlines}->{$level."count"} = {};
}
}
if (exists $self->{laststate}->{matchlines} and
Many thanks for your help on this topic π
Hi,
This PR fixes an issue where
--sticky
with--report=short
(the default) does not properly compute counters. This issue does not trigger when other--report
options are used.Below is the faulty behaviour :
The third run should not return 1 critical error, but 4 critical errors. With this PR, it now returns correct counters, and, as a cosmetic fix, properly adds the
...
in the output message :Thank you π