regilero / check_nginx_status

Nagios check for nginx status report
GNU General Public License v3.0
51 stars 25 forks source link

Negative values when restarting nginx #13

Closed tatref closed 8 years ago

tatref commented 8 years ago

I used the latest version of the plugin.

How to reproduce:

  1. Run the check
  2. The tmp file created contains something like: 1456396233 12 12 12
  3. Restart nginx
  4. Re-run the check
  5. The tmp now contains 1456396486 1 1 1
  6. The output displays negative values: ReqPerSec: -7.750 ConnPerSec: -7.750

The values may be bigger, depending on the server usage. I found this out when looking at the produced graphs.

I think we should check if the tmp file contains smaller values than the actual connection count, this would indicate a restart of nginx in the meantime.

What do you think about this ?

regilero commented 8 years ago

This is definitively an issue :-) At least when the counters are below previous counters we can/should detect it, return a PENDING status (instead of OK/WARNING/CRITICAL), and reset the counters. Now with a lot of restarts and a lot of incoming connections we wont be able to detect the reset (no negative values). Maybe there's something in the status output about the restart time, I'll need to check that.

I do not have much time currently, if you want to work on it do not hesitate :-)

tatref commented 8 years ago

I can't see anything related to uptime (see docs http://nginx.org/en/docs/http/ngx_http_stub_status_module.html). What about testing if the requests counter is less than the value in the temp file?

Also, I don't think Nagios defines a PENDING status: only OK, WARNING, CRITICAL, and UNKNOWN are defined (https://nagios-plugins.org/doc/guidelines.html#AEN78)

regilero commented 8 years ago

PENDING has always been there. It's the result when you do not have enough information to send a result. Usually a UNKNOWN is used, but PENDING was the right status. You can see PENDING for example on commends where checks have not been running yet in Centreon. And checks that use 2 data to compute a diff should use PENDING if they do not have 2 results.

This is avery special case, often forgotten, and maybe they removed that status from the doc. But it's in the book :-) and it's in the source code: https://github.com/NagiosEnterprises/nagioscore/blob/master/cgi/status.c#L163

tatref commented 8 years ago

I created a pull request. I tried to use the PENDING exit code, but unfortunately, Nagios 3 does not recognize it, so I just exit OK with all values set to 0.

regilero commented 8 years ago

Ok, let's add this fix then.

regilero commented 8 years ago

And, thanks for all.