sensu-plugins / sensu-plugins-http

This plugin provides native HTTP instrumentation for monitoring and metrics collection, including: response code, JSON response, HTTP last modified, SSL expiry, and metrics via `curl`.
http://sensu-plugins.io
MIT License
30 stars 97 forks source link

Check failed to run: execution expired #82

Open danktec opened 6 years ago

danktec commented 6 years ago

We still see this issue occasionally even after upgrading to sensu 1.0, sensu-plugin-1.4.5 with http-2.5.0

Check failed to run: execution expired, [
 \"/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:23:in `write'\", 
 \"/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:23:in `puts'\", 
 \"/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:23:in `puts'\", 
 \"/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:23:in `output'\", 
 \"/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/cli.rb:29:in `block (2 levels) in <class:CLI>'\", 
 \"/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-http-2.5.0/bin/check-http.rb:242:in `rescue in run'\", 
 \"/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugins-http-2.5.0/bin/check-http.rb:237:in `run'\", 
 \"/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/cli.rb:58:in `block in <class:CLI>'\"]
 \n","status":2

This check took 15 seconds to run. Expected to have taken less than 1

command: "check-http.rb -u https://domain/path --response-code 302 -r"

majormoses commented 6 years ago

@dhdanno I will try to re-produce locally

majormoses commented 6 years ago

So far I have not been able to reproduce locally using ruby 2.3 or ruby 2.4 I am using this command:

$ time bundle exec ./bin/check-http.rb -u https://na1.cloudcruiser.com/ --response-code 302 -r
CheckHttp OK: 302, 154 bytes

real    0m0.324s
user    0m0.148s
sys 0m0.020s

I have not been able to get it to takes even 0.4 seconds, is your endpoint publicly that I can test against? Can you replicate this behavior outside of the context of sensu?

danktec commented 6 years ago

The checks we have run every 60 seconds but this issue only occurs once perhaps every few days or less... The system is running 30 of these checks to remote endpoints. Unfortunately not publicly available... not sure if a delay in server response is related... trying to find that out now... might be worth simulating.

I'll see if i can replicate

majormoses commented 6 years ago

Could it be something endpoint specific? Maybe try monitoring that endpoint and see if you can replicate. I have tried several thousand times now and have been unable to replicate anything that indicates an issue outside of the context of sensu. I need to upgrade my work env to use newer versions of this gem (I just looked and I am pinned on 0.2.1 so it's quite old) to see if I can replicate in the context of sensu.

danktec commented 6 years ago

I have simulated a slow nginx server with

tc qdisc add dev eth0 root netem delay 10000ms

And this successfully triggers the timeout... but i'm not seeing the error... the correct "CheckHttp CRITICAL: Request timed out" is returned.

I'm not sure if the endpoint being slow to respond is the real issue... It might be worth parallelizing this test into thousands of requests to see if any come back with the error.

majormoses commented 6 years ago

hmm, unfortunately it's probably gonna be pretty hard to troubleshoot this. When I have some time I can try setting up a load test against some endpoint and see if I can replicate.

dkorel-copperleaf commented 6 years ago

I've been able to reproduce this, running the latest version of the check standalone against any non-responsive endpoint. The funny thing is, when i increase the timeout it seems to return properly.

Regular:

$ /opt/sensu/embedded/bin/ruby -v /tmp/check-http.rb -u http://8.8.8.8
ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]

# wait 15 seconds...

Check failed to run: execution expired, ["/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:23:in `write'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:23:in `puts'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:23:in `puts'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:23:in `output'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/cli.rb:29:in `block (2 levels) in <class:CLI>'", "/tmp/check-http.rb:242:in `rescue in run'", "/tmp/check-http.rb:237:in `run'", "/opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/cli.rb:58:in `block in <class:CLI>'"]

200 second timeout:

$ /opt/sensu/embedded/bin/ruby -v /tmp/check-http.rb -u http://8.8.8.8 -t 200
ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]

# wait 200 seconds...

$ /opt/sensu/embedded/lib/ruby/gems/2.4.0/gems/sensu-plugin-1.4.5/lib/sensu-plugin/check/cli.rb:13: warning: instance variable @check_name not initialized
CheckHttp CRITICAL: Request error: Failed to open TCP connection to 8.8.8.8:80 (Connection timed out - connect(2) for "8.8.8.8" port 80)
majormoses commented 6 years ago

Hmm that is indeed odd, can you re-produce this without using the embedded sensu ruby?

I tried this locally and got what I would have expected:

$ bundle exec ./bin/check-http.rb -u http://8.8.8.8 -t 15
CheckHttp CRITICAL: Request timed out
danktec commented 6 years ago

I installed bundler on a fresh system, and after managing to resolve dependencies, It seemed to work fine... so i guess it must be something with the ruby installation on the system

majormoses commented 6 years ago

Interesting...

johanek commented 5 years ago

This error could occur if DNS resolution fails. Ruby Timeout.timeout() expires and obscures the reason for the failure. With default resolver config on Linux (timeout 5s, 2 attempts) and the default max_retries of 1 in Net::HTTP (Not configurable in ruby <2.5.0) you end up making 4 requests, so you need a timeout setting of >20seconds to get a name resolution error.

Is the Timeout.timeout() call at https://github.com/sensu-plugins/sensu-plugins-http/blob/master/bin/check-http.rb#L254 needed? All the Net::HTTP timeouts are being explicitly set, I think that all the "correct" Request timed out errors are coming from that.

johanek commented 5 years ago

Even without using Timeout.timeout(), a slow DNS failure will be caught by Net::HTTP's open_timeout. I'm going to raise a PR to better capture the Net::HTTP errors to at least give a clue as to where the issue may be.

But eventually it's down to the user's DNS timeout settings if name resolution failures will be visible in the check output. (options timeout:1 in resolv.conf)

majormoses commented 5 years ago

I responded to the PR, I don't think we need it but removing it is a breaking change as it acts as a high level circuit breaker and supersedes the HTTP timeouts which are all currently configured to be the same value. There are some paths forward but all roads I can think of lead to a breaking change. See the linked PR for more details.