hashicorp / vagrant

Vagrant is a tool for building and distributing development environments.
https://www.vagrantup.com
Other
26.28k stars 4.44k forks source link

Vagrant up and provision hangs forever with Chef. #5687

Closed arionfx closed 9 years ago

arionfx commented 9 years ago

Hi,

I've been experiencing a strange problem with vagrant with chef. When I ran "vagrant up" or "vagrant provision", it will hang there forever after successfully modified the mode of a file through chef cookbook_file resource. I ran vagrant --debug, and I got the output below:

==> default: [2015-05-06T21:12:45+00:00] INFO: execute[set owner on /v/qbmsmule/mule/mule-enterprise-standalone-3.4.0] ran successfully
DEBUG ssh: stdout: [2015-05-06T21:12:45+00:00] INFO: cookbook_file[/tmp/mule-ee-license.lic] created file /tmp/mule-ee-license.lic
[2015-05-06T21:12:45+00:00] INFO: cookbook_file[/tmp/mule-ee-license.lic] updated file contents /tmp/mule-ee-license.lic
[2015-05-06T21:12:45+00:00] INFO: cookbook_file[/tmp/mule-ee-license.lic] mode changed to 644

 INFO interface: info: [2015-05-06T21:12:45+00:00] INFO: cookbook_file[/tmp/mule-ee-license.lic] created file /tmp/mule-ee-license.lic
[2015-05-06T21:12:45+00:00] INFO: cookbook_file[/tmp/mule-ee-license.lic] updated file contents /tmp/mule-ee-license.lic
[2015-05-06T21:12:45+00:00] INFO: cookbook_file[/tmp/mule-ee-license.lic] mode changed to 644
 INFO interface: info: ==> default: [2015-05-06T21:12:45+00:00] INFO: cookbook_file[/tmp/mule-ee-license.lic] created file /tmp/mule-ee-license.lic
==> default: [2015-05-06T21:12:45+00:00] INFO: cookbook_file[/tmp/mule-ee-license.lic] updated file contents /tmp/mule-ee-license.lic
==> default: [2015-05-06T21:12:45+00:00] INFO: cookbook_file[/tmp/mule-ee-license.lic] mode changed to 644
==> default: [2015-05-06T21:12:45+00:00] INFO: cookbook_file[/tmp/mule-ee-license.lic] created file /tmp/mule-ee-license.lic
==> default: [2015-05-06T21:12:45+00:00] INFO: cookbook_file[/tmp/mule-ee-license.lic] updated file contents /tmp/mule-ee-license.lic
==> default: [2015-05-06T21:12:45+00:00] INFO: cookbook_file[/tmp/mule-ee-license.lic] mode changed to 644
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...

I've tried vagrant 1.7.2 and 1.6.5, both have the same problem. And I use Chefdk 0.5.1 on OSX.

Any ideas?

Best regards, Xian

sethvargo commented 9 years ago

Hi @arionfx

Can you please share you Vagrantfile?

arionfx commented 9 years ago

Hi @sethvargo ,

Attached it.

# -*- mode: ruby -*-
# vi: set ft=ruby :

# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = '2'

Vagrant.require_version '>= 1.5.0'

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  # All Vagrant configuration is done here. The most common configuration
  # options are documented and commented below. For a complete reference,
  # please see the online documentation at vagrantup.com.

  config.vm.hostname = 'qbmsmule'

  # Set the version of chef to install using the vagrant-omnibus plugin
  # NOTE: You will need to install the vagrant-omnibus plugin:
  #
  #   $ vagrant plugin install vagrant-omnibus
  #
  # if Vagrant.has_plugin?("vagrant-omnibus")
  #   config.omnibus.chef_version = 'latest'
  # end

  # Every Vagrant virtual environment requires a box to build off of.
  # If this value is a shorthand to a box in Vagrant Cloud then
  # config.vm.box_url doesn't need to be specified.
  config.vm.box = 'chef/centos-7.0'

  # Assign this VM to a host-only network IP, allowing you to access it
  # via the IP. Host-only networks can talk to the host machine as well as
  # any other machines on the same network, but cannot be accessed (through this
  # network interface) by any external networks.
  config.vm.network :forwarded_port, guest: 80, host: 8090
  config.vm.network :forwarded_port, guest: 8080, host: 9080
  config.vm.network :forwarded_port, guest: 8443, host: 8445
  config.vm.network :private_network, type: 'dhcp'

  config.ssh.username = 'root'
  config.ssh.password = 'vagrant'
  config.ssh.insert_key = true

  config.vm.provision :chef_solo do |chef|

    chef.cookbooks_path = "../"

    chef.run_list = [
      'recipe[payments_qbms_mule]',
      'recipe[payments_qbms_mule::install_license]'
    ]
  end
end

And I just tested a few more times, it could hang like below too.

==> default: [2015-05-06T23:02:22+00:00] INFO: Run List is [recipe[payments_qbms_mule], recipe[payments_qbms_mule::install_license]]
==> default: [2015-05-06T23:02:22+00:00] INFO: Run List expands to [payments_qbms_mule, payments_qbms_mule::install_license]
==> default: [2015-05-06T23:02:22+00:00] INFO: Starting Chef Run for qbmsmule
==> default: [2015-05-06T23:02:22+00:00] INFO: Running start handlers
==> default: [2015-05-06T23:02:22+00:00] INFO: Start handlers complete.
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
sethvargo commented 9 years ago

Hi @arionfx

Are you able to reproduce this issue with a smaller Chef cookbook (or no cookbooks at all)? It is possible that cookbook is altering SSH keys or disabling a service that Vagrant needs to communicate with the machine.

arionfx commented 9 years ago

Hi @sethvargo

I just tested it in a cookbook with one recipe that only run "ls -la". And it got stuck at

DEBUG ssh: stdout: downloaded metadata file looks valid...

DEBUG ssh: stdout: downloading https://opscode-omnibus-packages.s3.amazonaws.com/el/6/x86_64/chef-12.2.1-1.el6.x86_64.rpm
  to file /tmp/install.sh.10939/chef-12.2.1-1.el6.x86_64.rpm
trying wget...

DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...

for a very long time. I did vagrant ssh to the same box while waiting, and I directly ran wget chef-client without any problem.

And I simplified my cookbook, so it just download a tar package, unpack it, and create a directory. It got stuck at

==> default: [2015-05-07T00:46:42+00:00] INFO: directory[/v/qbmsmule/mule/mule-enterprise-standalone-3.4.0] created directory /v/qbmsmule/mule/mule-enterprise-standalone-3.4.0
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
sethvargo commented 9 years ago

What is your guest and host OS?

arionfx commented 9 years ago

Guest: centos 7.0 Host: OSX 10.10

ghost commented 9 years ago

Hi @sethvargo,

I am having the same issue. The chef installs via the vagrant-omnibus plugin hangs when running vagrant up. My system setup is:

Here is my Vagrantfile:

# -*- mode: ruby -*-                                                                 
# vi: set ft=ruby :                                                                  

VAGRANTFILE_API_VERSION = '2'                                                        

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|                               

  config.vm.box = "opscode-ubuntu-14.04"                                             
  config.omnibus.chef_version = :latest                                              
  config.berkshelf.enabled = true                                                    

  config.vm.provision :chef_solo do |chef|                                           
    chef.run_list = "recipe[build-jenkins::default]"                                 
    chef.log_level = "debug"                                                         
  end                                                                                
end 

The cookbook I am using contains a single, simple recipe:

user 'test'

An interesting thing I noticed is that when the install hangs and I hit Ctrl-C, it shows that the Chef install continues to happen. However, the Chef run does not get completed. Things stop right after the Chef install.

==> default: Downloading Chef 12.4.0 for ubuntu...
DEBUG ssh: stdout: downloading https://www.chef.io/chef/metadata?v=12.4.0&prerelease=false&nightlies=false&p=ubuntu&pv=14.04&m=x86_64

 INFO interface: info: downloading https://www.chef.io/chef/metadata?v=12.4.0&prerelease=false&nightlies=false&p=ubuntu&pv=14.04&m=x86_64

 INFO interface: info: ==> default: downloading https://www.chef.io/chef/metadata?v=12.4.0&prerelease=false&nightlies=false&p=ubuntu&pv=14.04&m=x86_64
==> default: downloading https://www.chef.io/chef/metadata?v=12.4.0&prerelease=false&nightlies=false&p=ubuntu&pv=14.04&m=x86_64
DEBUG ssh: stdout:   to file /tmp/install.sh.1116/metadata.txt

 INFO interface: info:   to file /tmp/install.sh.1116/metadata.txt

 INFO interface: info: ==> default:   to file /tmp/install.sh.1116/metadata.txt
==> default:   to file /tmp/install.sh.1116/metadata.txt
DEBUG ssh: stdout: trying wget...

 INFO interface: info: trying wget...

 INFO interface: info: ==> default: trying wget...
==> default: trying wget...
DEBUG ssh: stdout: url  https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/10.04/x86_64/chef_12.4.0-1_amd64.deb
md5     630a8752be2cb45c69b7880adb2340f1
sha256  2d66c27884658f851d43cec850b4951b4d540492be521ae16f6941be80e8b1e6

 INFO interface: info: url      https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/10.04/x86_64/chef_12.4.0-1_amd64.deb
md5     630a8752be2cb45c69b7880adb2340f1
sha256  2d66c27884658f851d43cec850b4951b4d540492be521ae16f6941be80e8b1e6

 INFO interface: info: ==> default: url https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/10.04/x86_64/chef_12.4.0-1_amd64.deb
==> default: md5        630a8752be2cb45c69b7880adb2340f1
==> default: sha256     2d66c27884658f851d43cec850b4951b4d540492be521ae16f6941be80e8b1e6
==> default: url        https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/10.04/x86_64/chef_12.4.0-1_amd64.deb
==> default: md5        630a8752be2cb45c69b7880adb2340f1
==> default: sha256     2d66c27884658f851d43cec850b4951b4d540492be521ae16f6941be80e8b1e6
DEBUG ssh: stdout: downloaded metadata file looks valid...

 INFO interface: info: downloaded metadata file looks valid...

 INFO interface: info: ==> default: downloaded metadata file looks valid...
==> default: downloaded metadata file looks valid...
DEBUG ssh: stdout: downloading https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/10.04/x86_64/chef_12.4.0-1_amd64.deb

 INFO interface: info: downloading https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/10.04/x86_64/chef_12.4.0-1_amd64.deb

 INFO interface: info: ==> default: downloading https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/10.04/x86_64/chef_12.4.0-1_amd64.deb
==> default: downloading https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/10.04/x86_64/chef_12.4.0-1_amd64.deb
DEBUG ssh: stdout:   to file /tmp/install.sh.1116/chef_12.4.0-1_amd64.deb

 INFO interface: info:   to file /tmp/install.sh.1116/chef_12.4.0-1_amd64.deb

 INFO interface: info: ==> default:   to file /tmp/install.sh.1116/chef_12.4.0-1_amd64.deb
==> default:   to file /tmp/install.sh.1116/chef_12.4.0-1_amd64.deb
DEBUG ssh: stdout: trying wget...

 INFO interface: info: trying wget...

 INFO interface: info: ==> default: trying wget...
==> default: trying wget...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...

#################################################
# Where I hit Ctl-C 
#################################################
^C INFO interface: warn: Waiting for cleanup before exiting...
 INFO interface: warn: ==> default: Waiting for cleanup before exiting...
==> default: Waiting for cleanup before exiting...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: stdout: Comparing checksum with sha256sum...

 INFO interface: info: Comparing checksum with sha256sum...

 INFO interface: info: ==> default: Comparing checksum with sha256sum...
==> default: Comparing checksum with sha256sum...
DEBUG ssh: stdout: Installing Chef 12.4.0

 INFO interface: info: Installing Chef 12.4.0

 INFO interface: info: ==> default: Installing Chef 12.4.0
==> default: Installing Chef 12.4.0
DEBUG ssh: stdout: installing with dpkg...

 INFO interface: info: installing with dpkg...

 INFO interface: info: ==> default: installing with dpkg...
==> default: installing with dpkg...
DEBUG ssh: stdout: Selecting previously unselected package chef.

 INFO interface: info: Selecting previously unselected package chef.

 INFO interface: info: ==> default: Selecting previously unselected package chef.
==> default: Selecting previously unselected package chef.
DEBUG ssh: stdout: (Reading database ... 32400 files and directories currently installed.)

 INFO interface: info: (Reading database ... 32400 files and directories currently installed.)

 INFO interface: info: ==> default: (Reading database ... 32400 files and directories currently installed.)
==> default: (Reading database ... 32400 files and directories currently installed.)
DEBUG ssh: stdout: Preparing to unpack .../chef_12.4.0-1_amd64.deb ...

 INFO interface: info: Preparing to unpack .../chef_12.4.0-1_amd64.deb ...

 INFO interface: info: ==> default: Preparing to unpack .../chef_12.4.0-1_amd64.deb ...
==> default: Preparing to unpack .../chef_12.4.0-1_amd64.deb ...
DEBUG ssh: stdout: Unpacking chef (12.4.0-1) ...

 INFO interface: info: Unpacking chef (12.4.0-1) ...

 INFO interface: info: ==> default: Unpacking chef (12.4.0-1) ...
==> default: Unpacking chef (12.4.0-1) ...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: stdout: Setting up chef (12.4.0-1) ...

 INFO interface: info: Setting up chef (12.4.0-1) ...

 INFO interface: info: ==> default: Setting up chef (12.4.0-1) ...
==> default: Setting up chef (12.4.0-1) ...
DEBUG ssh: stdout: Thank you for installing Chef!

 INFO interface: info: Thank you for installing Chef!

 INFO interface: info: ==> default: Thank you for installing Chef!
==> default: Thank you for installing Chef!
DEBUG ssh: Exit status: 0
ERROR warden: Error occurred: Vagrant exited after cleanup due to external interrupt.
 INFO warden: Beginning recovery process...
 INFO warden: Calling recover: #<Vagrant::Action::Builtin::HandleForwardedPortCollisions:0x0000000280f0f0>
 INFO warden: Recovery complete.

...<remaining output is just error message lines from early exit>

Please let me know if there is any more info you need to help debug this.

Thanks, -Matthew

ghost commented 9 years ago

I've done a little more debugging and what I realized is that the "DEBUG ssh: Sending SSH keep-alive..." messages are not the vagrant process hanging (at least for me). It's actually an extremely slow download of the Debian package by the VM. When the keep-alive messages are being output, I can SSH to the VM and see the download occurring via tcpdump. After a few minutes, the package is eventually downloaded.

Out of curiousity, when I was logged into the VM via 'vagrant ssh' to observe tcpdump, I ran a wget on the Chef debian package and saw it downloaded in about 10s. This was taking about 2 minutes via the omnibus plugin so the question actually seems to be "why is wget so slow when being run via the omnibus plugin?"

I don't know exactly how the omnibus plugin works, but my assumption from the debug output is that it is just running the wget command via SSH. This is essentially the same thing I did with 'vagrant ssh' so I would expect it to have the same behavior. I'm wondering if there is something different about how the omnibus plugin SSH's to the VM versus how 'vagrant ssh' that could cause this behavior.

sethvargo commented 9 years ago

Hi @arionfx and @mmachajln

Vagrant 1.7.2 shipped with built-in support for installing Chef. I am unable to reproduce this issue with the omnibus plugin or recent versions of Vagrant that auto-install Chef. It is true that the Chef installation can take some time (usually about 2 minutes in my experience).

Unfortunately we cannot provide support for third-party plugins. If you're able to reproduce this issue with Vagrant 1.7.2 or later using the built-in Chef installer, please open a new issue and include the complete Vagrantfile as well as the debug out. Thanks and sorry!

Master-Chief-2007 commented 9 years ago

I ran into the same issue. Will post my solution, in case another person is running into it. SETUP

$ vagrant --version
Vagrant 1.7.2

$ vagrant plugin list
rest-client (1.6.9)
vagrant-berkshelf (4.0.4)
vagrant-omnibus (1.4.1)
vagrant-proxyconf (1.5.0)
vagrant-share (1.1.3, system)

Problem: wget step was hanging.

==> default: Installing Chef 12.4.1 Omnibus package...
==> default: Downloading Chef 12.4.1 for el...
==> default: downloading https://www.chef.io/chef/metadata?v=12.4.1&prerelease=false&nightlies=false&p=el&pv=7&m=x86_64
==> default:   to file /tmp/install.sh.7376/metadata.txt
==> default: trying wget...

Suggestion: Open up another window and do vagrant ssh into the machine and see whats going on (ps -ef)

In my situation the wget was running as stated in the log output but doing nothing.

Solution: Found that there were no PROXY settings in the machine. And I am running this behind a corporate firewall. In my hurry, I had forgotton to add the proxy configuration to the VagrantFile! you will need to install the vagrant-proxyconf plugin for this to work.

  1. Did vagrant destroy
  2. Added the following to the VagrantFile (replace XXXX with your corporate proxy server)
  if Vagrant.has_plugin?("vagrant-proxyconf")
    config.proxy.http     = "http://XXXX"
    config.proxy.https    = "http://XXXX"
    config.proxy.no_proxy = "localhost,127.0.0.1"
  end
  1. vagrant up It worked!

Hope this is of help.

ghost commented 9 years ago

@arionfx @sethvargo I wasn't behind a proxy so my issue was different. I'm still not sure why my network connection from the local running VM was slow, but it must be a network or VM configuration on my local machine that is causing the issue. That still isn't resolved, but below is the workaround I used.

Workaround The workaround I used was to use the _chef_omnibusurl option in the provisioner section of a .kitchen.local.yml file in combination with a webserver I already had running on my local machine.

Basically, I just put a chef install script and the chef packages on my local webserver and pointed kitchen to pull the install script and run it. Here was my .kitchen.local.yml file:

# .kitchen.local.yml
provisioner:                                                                         
  require_chef_omnibus: true           

  # 10.0.2.2 is the default gateway in the VM that is set by virtualbox              
  # the default gateway ends up being the host machine so that's why this            
  # worked for a webserver running on localhost.  Here was info that led
  # me to this:
  # https://blogs.oracle.com/fatbloke/entry/networking_in_virtualbox1     

  chef_omnibus_url: http://10.0.2.2/chef/install_chef.sh      

The install script located at the local URL path of _/chef/installchef.sh was this:

# http://10.0.2.2/chef/install_chef.sh
# Supports: debian systems only; could be easily modified for other OS package types

DEB=chef_12.3.0-1_amd64.deb                                                        
rm -f $DEB                                                                         
wget "http://10.0.2.2/chef/$DEB"                                                   
dpkg -i $DEB                                                                       

I could have used a local HTTP proxy as well for intercepting the download of the package, but I already had a webserver running so just leveraged that.

Hope this helps others too!

kenorb commented 9 years ago

I think I I've the same problem. It hangs after running the provisioning script with the message:

$ VAGRANT_LOG=info vagrant --debug up ... INFO ssh: Setting SSH_AUTH_SOCK remotely: /tmp/ssh-JLlGBlOoLe/agent.1550 ... DEBUG ssh: Sending SSH keep-alive... load: 2.69 cmd: ruby 22937 waiting 6.40u 0.88s load: 2.69 cmd: ruby 22937 waiting 6.40u 0.88s

Second provision on already half-provisioned VM worked fine.

I don't use any plugins, I don't need proxy (as the script works when executed manually). I'm using Vagrant v1.7.2 on OS X.

I'm using the following settings with API 2:

config.vm.box = "ubuntu/vivid64"
config.vm.box_version = "20150722.0.0"
config.vm.network "private_network", ip: "192.168.22.22"
config.vm.provision "shell", path: "scripts/provision.sh"
config.ssh.pty = true # Use pty for provisioning.
config.ssh.forward_agent = true # Enables agent forwarding over SSH connections.
config.ssh.forward_x11 = true # Enables X11 forwarding over SSH connections.

Workaround could include changing config.vm.boot_timeout I guess.

Or disabling config.ssh.pty.

More details I've described in #6086, #8118

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.