Open HOSTED-POWER opened 8 years ago
Thank you for buying a support contract :-) I am sorry, i am 100% buried in customer projects, i have no time to analyze this problem. You might try to fix it and send me a pull-request.
Hi,
I found the solution which is trivial once you know it!!
Inside sub valdiff (line 1679) you find
if ($self->{$_} >= $last_values->{$_}) {
$self->{'delta_'.$_} = $self->{$_} - $last_values->{$_};
} else {
# vermutlich db restart und zaehler alle auf null
$self->{'delta_'.$_} = $self->{$_};
}
I changed the first line to:
if ($self->{$_} - $last_values->{$_} > 0 ) {
$self->{'delta_'.$_} = $self->{$_} - $last_values->{$_};
} else {
# vermutlich db restart und zaehler alle auf null
$self->{'delta_'.$_} = $self->{$_};
}
You tried to fix the initial issue, but it does not seem to work. Comparing floats is dangerous apparently.
Some references I found:
http://stackoverflow.com/questions/21714162/how-to-compare-floating-points-beyond-just-equality-in-perl http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
The above solution has the same result and fixes all my issues.
What should be done to include it in the official code? It's really giving me a hard time over here atm, so I would love to see it fixed in the official code!
Hmm that fix doesn't fix everything, it was a bad idea in the first place!
If some value is 2000 and 1 minute later it's still 2000, it will also kick in. The result is that now 2000 is interpreted as the delta. Not a good idea :|
Well I finally seem to have fixed all issues. Code looks like this now (I marked changed parts in bold)
sub valdiff { my $self = shift; my $pparams = shift; my %params = %{$pparams}; my @keys = @_; my $now = time; my $last_values = $self->load_state(%params) || eval { my $empty_events = {}; foreach (@keys) { $empty_events->{$_} = 0; } $empty_events->{timestamp} = 0; if ($params{lookback}) { $empty_events->{lookback_history} = {}; } $empty_events; }; foreach (@keys) { if ($params{lookback}) { # find a last_value in the history which fits lookback best # and overwrite $last_values->{$_} with historic data if (exists $last_values->{lookback_history}->{$_}) { foreach my $date (sort {$a <=> $b} keys %{$last_values->{lookback_history}->{$_}}) { if ($date >= ($now - $params{lookback})) { $last_values->{$_} = $last_values->{lookback_history}->{$_}->{$date}; $last_values->{timestamp} = $date; last; } else { delete $last_values->{lookback_history}->{$_}->{$date}; } } } } $last_values->{$_} = 0 if ! exists $last_values->{$_}; if ($self->{$_} - $last_values->{$_} >= 0 ) { $self->{'delta_'.$_} = $self->{$_} - $last_values->{$_}; } else { # vermutlich db restart und zaehler alle auf null $self->{'delta_'.$_} = $self->{$_}; } $self->debug(sprintf "delta_%s %f", $_, $self->{'delta_'.$_}); } $self->{'delta_timestamp'} = $now - $last_values->{timestamp}; if($self->{'delta_timestamp'} <= 0){ $self->{'delta_timestamp'} = 'Infinity'; } $params{save} = eval { my $empty_events = {}; foreach (@keys) { $empty_events->{$_} = $self->{$_}; } $empty_events->{timestamp} = $now; if ($params{lookback}) { $empty_events->{lookback_history} = $last_values->{lookback_history}; foreach (@keys) { $empty_events->{lookback_history}->{$_}->{$now} = $self->{$_}; } } $empty_events; }; $self->save_state(%params); }
I'm unsure if the original check is still required, but it should certainly also allow a delta of 0 because this is entirely possible.
The culprit is the division by 0 because of the delta_timestamp. When this happens, the program crashes and next time it's 0 again because it was not able to write down the state because of the crash. This makes the program crash indefinitely...
I now tested this and if the statesfiles is corrupt or unwritable you will still have error. However if something else goes wrong and the delta would be 0 or smaller than 0 it will give a very large number.
During my tests this had no negative effects...
Another option would be to survive illegal division by 0 with eval or something, but this does the job perfectly, it's more centrally and more safe for future code updates.
Hi,
Small update: I did more tests, the latest proposed solution seems to be correct and working in all cases.
Could you review it and let me know if ok? Thanks!
Hi,
Any update on this?
In the meanwhile I discovered the error is mainly due to a problem with icinga2. However this line certainly has to be replaced:
if ($self->{$_} >= $lastvalues->{$}) {
by
if ($self->{$_} - $lastvalues->{$} >= 0 ) {
Hi,
Been a while since I updated this subject.
The proposed change in the last comment is really required:
root@highperf [~]# mysqladmin ext | grep Abort | Aborted_clients | 11 | | Aborted_connects | 163 |
1 minute later:
root@highperf [~]# mysqladmin ext | grep Abort | Aborted_clients | 11 | | Aborted_connects | 163 |
If between 2 runs, the value is the same, it gives false positives!
On each install I need to make the change:
sed -i 's/(\$self->{\$_} >= \$lastvalues->{\$})/(\$self->{\$_} - \$lastvalues->{\$} >= 0)/g' /usr/lib64/nagios/plugins/check_mysql_health
If a patch would help, please let me know and I can provide!
PS: I see no comments. The initial problem was caused by an icinga2 bug. But the latest comment about the fix is really required on each server I use your plugin.
I'm quite disappointed to not even receive some feedback :(
check_mysql_health version 2.2.1
Illegal division by zero at /usr/lib/nagios/plugins/check_mysql_health line 577. Illegal division by zero at /usr/lib/nagios/plugins/check_mysql_health line 591. Illegal division by zero at /usr/lib/nagios/plugins/check_mysql_health line 543. statefile /var/tmp/check_mysql_health/10.10.81.220_server::instance::tabletmpondisk_3306_information_schema is corrupt
I have a lot of issues with this plugin :(
I use it together with icinga2. The strange thing it I get these issues when I restart my monitoring daemon usually. When the issues happen they get solved by also restarting the daemon on the monitored client.
But on the other hand this plugin seems to be the only one with issues...
I really hope for a solution since I have tons of servers attached, I'm getting nuts of all these errors over and over :(
Thanks a lot for looking into this!
Kind Regards Jo