regilero / check_nginx_status

Nagios check for nginx status report
GNU General Public License v3.0
51 stars 25 forks source link

Connection refused in HTTPS #8

Open maethor opened 10 years ago

maethor commented 10 years ago

When I try to check on server in HTTPs, I get an error :

# sudo -u shinken /usr/local/shinken/libexec/check_nginx_status.pl -H paste.sysnove.net -S -d
Use of uninitialized value $o_warn_thresold in concatenation (.) or string at /usr/local/shinken/libexec/check_nginx_status.pl line 179.

Debug thresolds: 
Use of uninitialized value $o_crit_thresold in concatenation (.) or string at /usr/local/shinken/libexec/check_nginx_status.pl line 180.
Warning: () => Active: -1 ReqPerSec :-1 ConnPerSec: -1
Critical () => : Active: -1 ReqPerSec: -1 ConnPerSec : -1

DEBUG: HTTP url: 
https://paste.sysnove.net/nginx_status
DEBUG: HTTP request: 
IP used (better if it's an IP):paste.sysnove.net
GET https://paste.sysnove.net/nginx_status

NGINX CRITICAL - 500 Can't connect to paste.sysnove.net:443 (Connexion refusée)

The strange thing is that I don't even see a line in the Nginx log on my server. I don't see where the connection fails.

Do you have any idea ? I'm not familiar enought with Perl :(

Your plugin is great, thanks :)

regilero commented 10 years ago

We can try several things.

maethor commented 10 years ago

If HTTPs check usually works, do you think it may be caused by the HTTPs certificate ? I'm using Cacert Class 3 Root.

I'm using Shinken, by the way, but that should be irrelevant if the plugin works in HTTP and works well in Nagios and Centreon.

regilero commented 10 years ago

Yes, shinken is irrevelant. Plugins should be able to run with command line without any problem.

So it looks more like a bug with https connection managmenet in perl's LWP module.

So we may try some things:

On the http protocol settings here: https://github.com/regilero/check_nginx_status/blob/master/check_nginx_status.pl#L206-209

We can try to tweak the SSL settings with a few options:

my $ua = LWP::UserAgent->new(
  protocols_allowed => ['http', 'https'],
  timeout => $o_timeout
);
$ua->ssl_opts( verify_hostname => 0 );
$ua->ssl_opts( SSL_ca_file => 'name_of_ca_file' );
$ua->ssl_opts( SSL_ca_path => 'path_to_certificate_auth_file' );

details here http://search.cpan.org/~gaas/libwww-perl-6.05/lib/LWP/UserAgent.pm

The first one, verify_hostname disabled may be enough.

You may also add on this region:

$ENV{HTTPS_DEBUG} = 1;

And maybe on top you could try a:

use LWP::Protocol::https

as it seems this part of LWP as been removed from defaults.

maethor commented 10 years ago

Oh, thanks !

So, just by disabling verify_hostname, it works ! However, it works only in the "-H IP -s servername" form. If I use "-H servername -s servername" or even "-H servername", the plugin fails again.

Thank you again ! :)

regilero commented 10 years ago

But it's like your SSL certificate was not using the right name. Which is not the case, I think.

So it's more that Perl doesn't know how to check for this name validity. Or maybe there's a problem with local DNS resolution. Maybe from the monitoring server a DNS query on this domain name will not return the right IP, maybe an IP where the SSL resolution would not be the same?

I'd like to understand. I'll certainly add an option to remove name validation on ssl, but I would prefer understanding the problem :-)

maethor commented 10 years ago

Hum, no, there is no problem with local DNS resolution, and remember that the plugin works in HTTP with the "-H servername" form. I really don't understand.

Try to launch: check_nginx_status.pl -H 212.83.188.242 -s paste.sysnove.net -S -d and check_nginx_status.pl -H wiki.sysnove.net -s wiki.sysnove.net -S -d and check_nginx_status.pl -H wiki.sysnove.net -S -d

Except the fact that you'll get a 403 error, only the first command will work.

regilero commented 10 years ago

From an external server I get a 403 erverytime. Verify that your nginx virtualhosts are defined on the same IP. Could be that the virtualhost working on port 80 works for all IP interfaces and the one running on port 443 would differ.

Now you use two names paste and wiki, maybe only one of theses names is present in the certificate, or you have a multi-name certificate and you use SNI?

maethor commented 10 years ago

I monitor from an external server, and I never get a 403. So I really think the error come from the lib. I'm on Debian wheezy, and I install LWP with the Debian packages libwww-perl.

I mixed two domains, sorry about that, but they are both on the same server with the same wildcard certificate.

maethor commented 10 years ago

I think it can't be a server-side problem, because I don't get any log.

When i use

# sudo -u shinken /usr/local/shinken/libexec/check_nginx_status.pl -H 212.83.188.242 -s paste.sysnove.net -S -d
NGINX OK -  0.330 sec. response time, Active: 2 (Writing: 1 Reading: 0 Waiting: 1) ReqPerSec: 0.074 ConnPerSec: 0.012 ReqPerConn: 1.396|Writing=1;;;; Reading=0;;;; Waiting=1;;;; Active=2;;;; ReqPerSec=0.074074;;;; ConnPerSec=0.012346;;;; ReqPerConn=1.396458;;;;

I have an entry in the nginx access.log:

"GET /nginx_status HTTP/1.1" 200 106 "-" "libwww-perl/6.04"

When I use:

# sudo -u shinken /usr/local/shinken/libexec/check_nginx_status.pl -H paste.sysnove.net -S -d
NGINX CRITICAL - 500 Can't connect to paste.sysnove.net:443 (Connexion refusée)

I have nothing in access.log nor error.log.

regilero commented 10 years ago

Yes, thanks for details. But a wildcard certificate is maybe a good hint on the problem. SSL negociation, by default, works better with single name virtualhosts, as it happens before the name-virtualhost negociation. So this is maybe the reason of the 500 errors. About the logs, are you sure all your nginx virtualhosts, even the default one, have logs? Check everywhere. But here we have a big abstraction level with LWP... and abstractions leaks. Hard to debug. Maybe, if you have some spare time for it, you could get a trace using methods explained in the code sample here: http://search.cpan.org/dist/libwww-perl/lib/LWP/Debug.pm

There's maybe a simple way have having debug adding this block in the code, before line 206:

BEGIN {
  $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0; #or 1
  $ENV{HTTPS_DEBUG} = 1;  #Add debug output
}

I'll make a change on code for the verify_hostname feature, but it seems the LWP library made some heavy moves on SSL and I'm having difficulties to write something working with various versions...

maethor commented 10 years ago

Yes, all nginx virtualhosts have logs. I search everywhere, nothing. I think nginx doesn't log connections, just requests.

Adding theses lines doesn't change anything. Unfortunatly, I don't have much time right now to trace everything.

regilero commented 10 years ago

OK, thanks for the detective work. I hope you have a working configuration, at least.

maethor commented 10 years ago

Thank you for the help !

ep4sh commented 5 years ago

Had them same trouble sudo -u nagios /usr/lib/nagios/plugins/check_nginx_status.pl -H mail.ea34.ru -s mail.ea34.ru -d -S --disable-sslverifyhostname NGINX CRITICAL - 500 Can't connect to mail.ea34.ru:443 (certificate verify failed)

even i set up:
my $o_disable_sslverifyhostname = 1;

Trouble exists :(