pixelcog / opsworks-vm

Virtual machine templates for AWS OpsWorks running on Ubuntu 12.04 and 14.04
MIT License
60 stars 22 forks source link

periodic timout of 60-110 seconds when starting opsworks-agent #6

Open muddman opened 8 years ago

muddman commented 8 years ago

Thanks so much for making opsworks-vm available -- we love it!!

This probably isn't a bug with opsworks-vm but it is behavior I only see when using opsworks-vm so I thought I'd post it. Feel free to close it if you don't think it is appropriate to open here.

Most of the time during a vagrant up/provision we have a delay of 60-110 seconds right as the opsworks-agent is starting. The log shows 6-11 "sleep 10 sec" messages (see below). We have tracked it down to the /opt/aws/opsworks/current/bin/opsworks-agent-updater process which is kicked off at the beginning of each run. If we have a vagrant ssh up and manually kill the opsworks-agent-updater process then the run is not delayed.

Has anyone seen this or know what we can do to keep it from happening?

Here is the log with the sleeps that happen during a vagrant up/provision:

default: Running: inline script
==> default: Checking dependencies...
==> default: Reading input...
==> default: Processing 'recipes.json'...
==> default: Processing 'ops/opsworks/[REDACTED].json'...
==> default: Processing 'ops/dna/[REDACTED].json'...
==> default: Parsing deployments...
==> default: Parsing custom cookbooks...
==> default: Purging local cookbooks cache from '/opt/aws/opsworks/current/site-cookbooks'...
==> default: Generating dna.json...
==> default: Running opsworks agent...
==> default: [2015-10-16 21:54:33]  INFO [opsworks-agent(7456)]: About to run custom custom command from: /tmp/vagrant-opsworks20151016-7438-3rkgmo/dna.json
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: (locked: sleeping 10 secs)
==> default: [Fri Oct 16 21:56:33 2015] About to execute: RUBYOPT="-E utf-8" /opt/aws/opsworks/current/bin/chef-client -j /var/lib/aws/opsworks/chef/2015-10-16-21-54-33-01.json -c /var/lib/aws/opsworks/client.stage1.rb -o opsworks_custom_cookbooks::load,opsworks_custom_cookbooks::execute 2>&1
==> default: [2015-10-16T21:56:34+00:00] INFO: Starting chef-zero on port 8889 with repository at repository at /opt/aws/opsworks/current
==> default:   One version per cookbook
==> default:   data_bags at /var/lib/aws/opsworks/data/data_bags
==> default:   nodes at /var/lib/aws/opsworks/data/nodes
==> default: [2015-10-16T21:56:34+00:00] INFO: Forking chef instance to converge...
==> default: [2015-10-16T21:56:34+00:00] INFO: *** Chef 11.10.4 ***
==> default: [2015-10-16T21:56:34+00:00] INFO: Chef-client pid: 7674

Here are the processes we see running if we have a vagrant ssh up during the run:

root      6419  6418  0 22:09 ?        00:00:00 /bin/sh -c cd /opt/aws/opsworks/current && bin/lockrun --verbose --lockfile=/var/lib/aws/opsworks/lockrun.lock -- /opt/aws/opsworks/current/bin/opsworks-agent-updater
root      6421  6419  0 22:09 ?        00:00:00 bin/lockrun --verbose --lockfile /var/lib/aws/opsworks/lockrun.lock -- /opt/aws/opsworks/current/bin/opsworks-agent-updater
root      6422  6421  7 22:09 ?        00:00:00 /opt/aws/opsworks/local/bin/ruby /opt/aws/opsworks/current/bin/opsworks-agent-updater

If we manually ssh in and kill -9 6419 then everything continues without delay.

muddman commented 8 years ago

This is now happening for us on every vagrant up and every vagrant deploy. It is very possible that this problem is localized to our environment but after a vagrant up it appears that the agent is kicking off every 5 minutes inside the vm and also on every vagrant provision which locks the run for up to 2 minutes.

We have an extremely ugly temporary fix for anyone else who is experiencing the same issue and we will post if we come up with a real solution to the problem.

ops/kill-opsworks-agent.sh

until ps auxww | grep [o]psworks-agent-updater
  do sleep 5
done
ps auxww | grep [o]psworks-agent-updater | awk '{print $2}' | sudo xargs kill -9 &

Vagrantfile

Vagrant.configure("2") do |config|
  config.vm.box = "ubuntu1404-opsworks"
  config.vm.provision :shell, path: "ops/kill-opsworks-agent.sh"
  config.vm.provision "opsworks", type:"shell", args:"path/to/dna.json"
end
brutuscat commented 8 years ago

We had to fix the timeout issues like this https://github.com/Vreasy/opsworks-vm/pull/2

felixSchl commented 8 years ago

I am experiencing the same, consistently. The hack provided by @muddman seems to do the trick.