sous-chefs / apache2

Development repository for the apache2 cookbook
https://supermarket.chef.io/cookbooks/apache2
Apache License 2.0
284 stars 548 forks source link

apache2 service only_if guard timeout exceeded: default.rb::line 36 #238

Closed greenreign closed 9 years ago

greenreign commented 10 years ago

Script timed out running httpd -t

 Mixlib::ShellOut::CommandTimeout
           --------------------------------
           Command timed out after 2s:
           Command execeded allowed execution time, process terminated
           ---- Begin output of /usr/sbin/httpd -t ----
           STDOUT:
           STDERR:
           ---- End output of /usr/sbin/httpd -t ----
           Ran /usr/sbin/httpd -t returned

When I run the script from the box it returns Syntax OK But it always takes about 5 seconds. Can you make the timeout on the only_if guard longer or configurable? see default.rb::line 36

svanzoest commented 10 years ago

Hi @greenreign, Thank you for your report. Can you provide a bit more input on your environment? What operating system are you on, what modules and number of vhosts are loaded? I am curious as to what would cause a syntax check to take so long to execute.

greenreign commented 10 years ago
[vagrant@default-centos-64 ~]$ cat /etc/centos-release
CentOS release 6.4 (Final)

[vagrant@default-centos-64 httpd]$ sudo httpd -S
VirtualHost configuration:
[public_ip]:80        [Host_Name](/etc/httpd/sites-enabled/ci.conf:1)
wildcard NameVirtualHosts and _default_ servers:
*:80                   is a NameVirtualHost
         default server [Host_Name] (/etc/httpd/sites-enabled/public.conf:1)
         port 80 namevhost [Host_Name] (/etc/httpd/sites-enabled/public.conf:1)
Syntax OK

[vagrant@default-centos-64 httpd]$ ls mods-enabled/
alias.conf       authz_default.load    autoindex.conf  dir.conf      log_config.load  negotiation.conf  proxy.load     status.conf
alias.load       authz_groupfile.load  autoindex.load  dir.load      logio.load       negotiation.load  rewrite.load   status.load
auth_basic.load  authz_host.load       deflate.conf    env.load      mime.conf        proxy.conf        setenvif.conf
authn_file.load  authz_user.load       deflate.load    headers.load  mime.load        proxy_http.load   setenvif.load
[vagrant@default-centos-64 httpd]$ ls mods-available/
alias.conf       authz_default.load    autoindex.conf  dir.conf      log_config.load  negotiation.conf  proxy.load     status.conf
alias.load       authz_groupfile.load  autoindex.load  dir.load      logio.load       negotiation.load  rewrite.load   status.load
auth_basic.load  authz_host.load       deflate.conf    env.load      mime.conf        proxy.conf        setenvif.conf
authn_file.load  authz_user.load       deflate.load    headers.load  mime.load        proxy_http.load   setenvif.load
greenreign commented 10 years ago

[public_ip] and [Host_Name]'s are a valid public IP and hostname.

greenreign commented 10 years ago

It's a basic default run of the recipe other than the virtual hosts and adding mod_proxy and mod_proxy_http. I'll admit I'm messing around with the virtual hosts and I don't understand them that well.

svanzoest commented 10 years ago

thanks. my gut feeling says that the delay is related to the proxy setup, but I haven't written any tests for that yet.

greenreign commented 10 years ago

See a glaring issue here?

<VirtualHost sub.example.com:80 >
  ServerName sub.example.com
  <Proxy *>
    Order allow,deny
    Allow from all
  </Proxy>
  ProxyPass / http://localhost:8080/
  ProxyPassReverse / http://localhost:8080/
</VirtualHost>
svanzoest commented 10 years ago

Did you find out any more why it takes so long to do a config test?

greenreign commented 10 years ago

Thank you. I didn't find out what was causing the slow response.

To add to the details: I did not have a problem when running on Amazon Linux from AWS. It was only too slow when running on my local Centos Vagrant virt. I was able to get around it when I changed the vhost entry from to <VirtualHost *:80>
Perhaps it was DNS lookup?

podwhitehawk commented 10 years ago

Code above will fix that issue. Slow response is caused by big amount of config files to test with "httpd -t" in conjunction with slow underlying storage system. As an example you can try a slow 5400RPM HDD typically located in notebooks. Try to copy a lot of small files (I've tested it by duplicating RPM packages located in CentOS 6.5 DVD) and converging an apache2 cookbook at the same time.

svanzoest commented 10 years ago

@podwhitehawk removing the timeout means it may spin for a significant amount of time and possibly pile up with no recourse on productions. Are you saying that 10 seconds is not enough?

podwhitehawk commented 10 years ago

@svanzoest I think that timeouting check operation is a bad idea. You have encountered that already with 2 seconds. And will encounter with 10 seconds again sooner or later. So cookbook should check exit status and not terminate itself with timeout. P.S. any linux command is very stable, so it should never spin forever.

svanzoest commented 10 years ago

@podwhitehawk it is more related to performance and having the chef run halt, causing later recipes in the run_list to not run. There is no negative in just moving on and trying again at the next convergence. Also, keep in mind that this cookbook supports other platforms that are not linux based.

podwhitehawk commented 10 years ago

@svanzoest it will never halt, it's not time dependent at all. So it will end convergence minute or two later and it's not deathly. It's more important to have accurate result instead of failing cookbook. Do you agree?

svanzoest commented 10 years ago

@podwhitehawk I agree. Just need to create a test case to ensure this behavior.

podwhitehawk commented 10 years ago

@svanzoest I've already tried that piece of code with slow notebook drive like I've described before. And another case - I've tried is inserting sleep like "sleep 300; httpd -t" and I can't get that process failing.

svanzoest commented 10 years ago

@podwhitehawk we should add it as a serverspec test.

podwhitehawk commented 10 years ago

@svanzoest I'm trying to undrestand how to check it, but no luck. any suggestions?

svanzoest commented 10 years ago

@podwhitehawk I would use test kitchen and update the serverspec tests in test/integration/default/serverspec to test if the chef run actually completes in a negative test.

svanzoest commented 9 years ago

The more I think about this is that what happens when the test never completes? Do we really have a case where it does not complete in 10 seconds? Ultimately there needs to be a time out somewhere. I do not really have the time to test this out, so feel free to reopen this if someone has an example and we can actually confirm what the behavior is when the test never completes.

docwhat commented 9 years ago

I'm seeing two things that can mitigate this problem. Doing both would be ideal.

One, httpd -t runs on every chef-run. Ideally, 'httpd -t' should only be run when configurations change, not every run. For example, for our in-house sshd configuration management, we create the config file(s) in the cache and run sshd -T (which is like httpd -t) on those cache files. If it passes, we copy over the new files using normal chef resources (e.g. so it won't do it if they didn't change). Iff the copy makes a change then it triggers a restart.

There are a lot of apache config files and it would be tricky to exactly the same trick with apache, but maybe something else could be done, just as only running httpd -t when start and graceful are triggered, not for all the actions on the service.

Two, httpd -t should never "just hang" (not including things like filesystem corruption, or the kernel crashing). If it is taking a long time it is because the disk IO is slow, the VM is being moved across the country live, the system is really busy, or maybe swap is thrashing then even 10 seconds is too short. The point being, in that case, chef should continue to do its job, even if it takes a while. It is up to the monitoring software to alert someone that the system is very busy. If the next Chef run fails because chef is still running from the first time, someone will get notified then.

I would suggest if you must put a timeout, set it really high, like 3 minutes. That way it acts as an absolutely last resort measure.

sergio-bobillier commented 7 years ago

I'm experiencing this issue when I run test kitchen in a Vagrant - Virtual Box instance. In my test environment Apache has quite a lot of Virtual Hosts configured (around 360). When running the tests in an AWS instance they pass w/o issue but my local machine and thus the Virtual Box VM is not fas enough and the converge process fails.

/usr/sbin/httpd -t is taking 12 seconds to execute instead of just 10

Can you make this timeout configurable? I would like to set it to 20 seconds on my local machine but keep it as 10 seconds on the AWS instances or when the recipe is run in an actual server.

araj1 commented 7 years ago

The issue still exists, is there anyway we can increase the default timeout ?

lock[bot] commented 6 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.