random "Forcing shutdown of VM" through provisioning process

LarsFronius commented 13 years ago

If I run "vagrant up" and have a puppet provisioner configured, the provisioning process sometimes gets abborted by the following message. I think this happens more often if I run puppet in the foreground - being done with puppet.options = ["--test"] in the Vagrantfile.

[avm01] Forcing shutdown of VM...
[avm01] Destroying VM and associated drives...
[avm01] Destroying unused networking interface...
/usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:22:in `select': closed stream (IOError)
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:22:in `io_select'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/net-ssh-2.1.4/lib/net/ssh/transport/packet_stream.rb:73:in `available_for_read?'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/net-ssh-2.1.4/lib/net/ssh/transport/packet_stream.rb:85:in `next_packet'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/net-ssh-2.1.4/lib/net/ssh/transport/session.rb:169:in `block in poll_message'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/net-ssh-2.1.4/lib/net/ssh/transport/session.rb:164:in `loop'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/net-ssh-2.1.4/lib/net/ssh/transport/session.rb:164:in `poll_message'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/net-ssh-2.1.4/lib/net/ssh/connection/session.rb:451:in `dispatch_incoming_packets'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/net-ssh-2.1.4/lib/net/ssh/connection/session.rb:213:in `preprocess'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/net-ssh-2.1.4/lib/net/ssh/connection/session.rb:197:in `process'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/net-ssh-2.1.4/lib/net/ssh/connection/session.rb:161:in `block in loop'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/net-ssh-2.1.4/lib/net/ssh/connection/session.rb:161:in `loop'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/net-ssh-2.1.4/lib/net/ssh/connection/session.rb:161:in `loop'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/net-ssh-2.1.4/lib/net/ssh/connection/channel.rb:269:in `wait'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/ssh/session.rb:55:in `sudo!'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/provisioners/puppet_server.rb:47:in `block in run_puppetd_client'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/ssh.rb:119:in `execute'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/provisioners/puppet_server.rb:46:in `run_puppetd_client'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/provisioners/puppet_server.rb:24:in `provision!'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/vm/provision.rb:22:in `block in call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/vm/provision.rb:20:in `each'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/vm/provision.rb:20:in `call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/warden.rb:30:in `call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/vm/forward_ports.rb:95:in `call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/warden.rb:30:in `call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/vm/clear_forwarded_ports.rb:21:in `call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/warden.rb:30:in `call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/vm/clean_machine_folder.rb:17:in `call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/warden.rb:30:in `call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/vm/check_guest_additions.rb:30:in `call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/warden.rb:30:in `call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/vm/match_mac_address.rb:21:in `call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/warden.rb:30:in `call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/vm/import.rb:26:in `call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/warden.rb:30:in `call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/vm/check_box.rb:23:in `call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/warden.rb:30:in `call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action/builder.rb:120:in `call'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action.rb:134:in `block (2 levels) in run'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/util/busy.rb:19:in `busy'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action.rb:134:in `block in run'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/environment.rb:364:in `block in lock'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/environment.rb:354:in `open'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/environment.rb:354:in `lock'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/action.rb:133:in `run'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/vm.rb:140:in `up'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/command/up.rb:13:in `block in execute'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/command/up.rb:8:in `each'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/command/up.rb:8:in `execute'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/thor-0.14.6/lib/thor/task.rb:22:in `run'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/thor-0.14.6/lib/thor/invocation.rb:118:in `invoke_task'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/thor-0.14.6/lib/thor/invocation.rb:124:in `block in invoke_all'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/thor-0.14.6/lib/thor/invocation.rb:124:in `each'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/thor-0.14.6/lib/thor/invocation.rb:124:in `map'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/thor-0.14.6/lib/thor/invocation.rb:124:in `invoke_all'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/thor-0.14.6/lib/thor/group.rb:226:in `dispatch'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/thor-0.14.6/lib/thor/invocation.rb:109:in `invoke'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/lib/vagrant/cli.rb:45:in `block in register'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/thor-0.14.6/lib/thor/task.rb:22:in `run'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/thor-0.14.6/lib/thor/invocation.rb:118:in `invoke_task'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/thor-0.14.6/lib/thor.rb:263:in `dispatch'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/thor-0.14.6/lib/thor/base.rb:389:in `start'
    from /usr/local/Cellar/ruby/1.9.2-p290/lib/ruby/gems/1.9.1/gems/vagrant-0.8.7/bin/vagrant:21:in `<top (required)>'
    from /usr/local/Cellar/ruby/1.9.2-p290/bin/vagrant:19:in `load'
    from /usr/local/Cellar/ruby/1.9.2-p290/bin/vagrant:19:in `<main>'

I think this doesn't fail when I only run 'vagrant provision' after I created a VM without a provisioner configured in the Vagrantfile when created with 'vagrant up'.

I run ruby 1.9.2 with vagrant 0.8.7 on a Mac OS 10.7.1 in iTerm 2 - if this is relevant for the "closed stream (IOError)" in the first ruby error line. But I have also heard from a colleague who runs Ubuntu, that he has the same issue. He uses puppetfiles from his local machine and I use a remote puppet server. From that point of view I can say that I hardly believe this is a general problem, rather than a Mac OS or iTerm specific or a problem with that I use a remote puppet server.

This is my Vagrantfile btw. Vagrant::Config.run do |config| config.vm.define :avm01 do |config| config.vm.box = "squeeze-pxe-vbox4.1.4-v3" config.vm.network "33.33.33.10" config.vm.customize do |vm| vm.memory_size = 1024 end end config.vm.provision :puppet_server do |puppet| puppet.puppet_server = "puppet.fq.dn" puppet.puppet_node = "avm01.vagrant.internal" puppet.options = ["--test"] end end

briandailey commented 13 years ago

I run into the same error, but with me it always happens during a long running process (in my case running pip install requirements on a guest).

[default] [Wed, 19 Oct 2011 05:56:39 -0700] INFO: execute[install requirements] sh(. /home/vagrant/dev-env/bin/activate && pip install -r /vagrant/requirements.txt)
 : stdout
[default] [Wed, 19 Oct 2011 05:58:41 -0700] ERROR: execute[install requirements] (main::dev line 8) has had an error
 : stdout
[default] Forcing shutdown of VM...
[default] Destroying VM and associated drives...
/usr/lib/ruby/gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:33:in `select': closed stream (IOError)

I can re-run it without making any changes and it'll run just fine. Appears to be completely random. I'm wondering if it has something to do with available memory or network problems.

jstallings commented 13 years ago

FWIW I ran into the same issue as well on long running actions. In my case it is running chef. I as well can rerun and it will continue

[default] [Wed, 19 Oct 2011 17:02:41 +0200] INFO: Storing updated cookbooks/apache2/recipes/mod_proxy.rb in the cache.
: stdout
[default] [Wed, 19 Oct 2011 17:03:01 +0200] ERROR: Running exception handlers
: stdout
[default] Forcing shutdown of VM...
[default] Destroying VM and associated drives...
/Library/Ruby/Gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:33:in `select': closed stream (IOError)
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:33:in `io_select'
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:32:in `synchronize'
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:32:in `io_select'
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/transport/packet_stream.rb:73:in `available_for_read?'
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/transport/packet_stream.rb:85:in `next_packet'
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/transport/session.rb:169:in `poll_message'
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/transport/session.rb:164:in `loop'

I also have modified my ssh parameters to be as follows with no benefit:

  config.ssh.max_tries = 100
  config.ssh.timeout = 600

nickpeirson commented 12 years ago

I'm seeing this with Chef as well:

...
[default] [Thu, 27 Oct 2011 09:26:03 +0000] INFO: execute[upgrade-pear-with-pear] ran successfully
: stdout
[default] Forcing shutdown of VM...
[default] Destroying VM and associated drives...
[default] Destroying unused networking interface...
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:22:in `select': closed stream (IOError)
...

Everything seems to be going well, then it stops and tears down the VM.

mitchellh commented 12 years ago

I've seen this as well. Not sure yet how to work around this but marking as a bug.

mitchellh commented 12 years ago

Looks like I can do client side keep-alive on connections that take a long time: http://net-ssh.github.com/net-ssh/classes/Net/SSH/Connection/Session.html#M000091

Unfortunately I don't think this will work in this case because we're waiting on a command to finish... Hm.

mitchellh commented 12 years ago

Also looks like there is this keep-alive setting, but I'm not sure if I can set it for net-ssh: http://www.fettesps.com/enable-ssh-keep-alive/

dickbrouwer commented 12 years ago

Same here:

[default] Forcing shutdown of VM... [default] Destroying VM and associated drives... /Users/dikbrouwer/.rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:22:inselect': closed stream (IOError) from /Users/dikbrouwer/.rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:22:in io_select' from /Users/dikbrouwer/.rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.1.4/lib/net/ssh/transport/packet_stream.rb:73:inavailable_for_read?`

Any suggestions for a temporary workaround? I'm unable to provision the VM at all as it happens every time for me (Ubuntu 11.10 if that matters, I thought I wasn't seeing it before).

gregormelhorn commented 12 years ago

same here. This is a blocker...

chalfant commented 12 years ago

Can we disable the forced shutdown and destroy during the provision step? Where would I look in the code to do this?

mitchellh commented 12 years ago

Interesting. Can anyone else that can consistently cause this problem reproduce @chalfant's results?

dickbrouwer commented 12 years ago

I can, it happens on every run (with ruby 1.9.2, where did Gregor's comment re: reuby 1.8.7+ go?)

On Thu, Jan 5, 2012 at 10:34 PM, Mitchell Hashimoto < reply@reply.github.com

wrote:

Interesting. Can anyone else that can consistently cause this problem reproduce @chalfant's results?

Reply to this email directly or view it on GitHub: https://github.com/mitchellh/vagrant/issues/516#issuecomment-3376704

nickpeirson commented 12 years ago

@mitchellh Was that with reference to the comment about this issue being related to ruby 1.8.7 (which now appears to be removed)?

I'm seeing this with ruby version "1.9.2p290 (2011-07-09 revision 32553) [x86_64-darwin10.8.0]" so not a Ruby 1.8.7 problem as far as I can tell.

mitchellh commented 12 years ago

Ah, it must have been deleted. :)

Yeah I figured this would affect every version. Sigh. I'm still not sure how to fix this since it appears to be a Net::SSH bug but Net::SSH is in maintenance mode and only accepting pull requests, and I'm not sure where to begin in there. :-\ I'll continue thinking.

gregormelhorn commented 12 years ago

yes, I was too slow in deleting. I noticed I did some changes that would cause my command generate some output from time to time. In this case, the error does not happen, though output is only shown in the end.

mitchellh commented 12 years ago

So I think I may have a way to fix this. Can anyone post a Vagrantfile/puppet recipe that can consistently show this bug? I'm having trouble repo-ing it.

gregormelhorn commented 12 years ago

I did not try this out, but something like

execute "sleep" do command "sleep 3600" end

should do it.

syelle commented 12 years ago

@mitchellh We can reproduce often (although not consistently) but it's on a chef provision that connects to our chef server. It usually fails when trying to install mysql via apt-get on Ubuntu 10.04.

dickbrouwer commented 12 years ago

For me, compiling node.js often fails (although it's part of a much larger recipe). A large pip_install requirements.txt will also do it.

On Jan 9, 2012, at 2:47 PM, Shaun reply@reply.github.com wrote:

@mitchellh We can reproduce often (although not consistently) but it's on a chef provision that connects to our chef server. It usually fails when trying to install mysql via apt-get on Ubuntu 10.04.

Reply to this email directly or view it on GitHub: https://github.com/mitchellh/vagrant/issues/516#issuecomment-3421884

kumar303 commented 12 years ago

This reproduces it every single time for me on Ruby 1.8.7 and vagrant 0.8.10 (I also tried with vagrant 0.9.2 and I think it happens there too).

git clone -n git://github.com/mozilla/zamboni.git
cd zamboni
git checkout 0bbef8533575875e7240c142957e8d09a797ee26
git submodule update --init --recursive
vagrant up

I am desperately trying to find a workaround for this :( As others have commented, the exception is triggered during pip install -r requirements/compiled.txt which often takes a long time.

syelle commented 12 years ago

We've worked around this by not relying on vagrant up to perform provisioning.

After vagrant up has created the box and right as it starts to provision our box with chef-client, I use ctrl+c to bail out of vagrant up. Then I run vagrant ssh to get into the box and run chef-client manually from there. This circumvents the troublesome timeout in our case since chef is not relying on vagrant to establish connections anymore.

kumar303 commented 12 years ago

Ugh. This is pretty hard to work around. I packaged up my provisioned machine which cut down on long running commands but we need our users to periodically run database migrations via puppet. These can run long.

Any ideas on how to fix the SSH timeout?

At the very least is there a way to rescue the IOError so that the command can fail without causing the VM to shut down?

kumar303 commented 12 years ago

Here are some more clues. I think it's more than just an SSH timeout. It appears that the VM is actually losing all connectivity to the network.

To workaround the problem I installed screen and set up /home/vagrant/.profile so it starts a screen session when you log in and runs a script to download everything it needs when you SSH in. I see the same kind of error where it says

Connection to 127.0.0.1 closed by remote host.

When I ssh back in to resume the screen session, I see an exception in my Python script (that was running) showing that it lost connectivity. This has happened several times now.

If this were merely some kind of SSH timeout then in theory the screen session would have continued running in the background. Also, I went nuts and reconfigured /etc/ssh/sshd_config so that it could not possibly timeout. That did not have any affect.

I can reproduce this consistently. How can I help debug the source of the problem? I'm on IRC, freenode and irc.mozilla.org as kumar

jdunck commented 12 years ago

Possibly related: https://www.virtualbox.org/ticket/6559

Though I'll note I'm using VirtualBox 4.1.4 and am still having this issue.

jdunck commented 12 years ago

I'm w/ @kumar303 in that it doesn't seem to be just an ssh issue.

I checked ulimit -a and noted that it had 1024 files-per-proc so thought that might be being hit - no, it doesn't appear so, nor OOM. Quite frustrating :-/

mattd commented 12 years ago

Is there any update on this issue? I'm seeing the same thing running a chef_client provision. I lose connection right around the time that my cookbooks have finished downloading.

mattd commented 12 years ago

After a bunch of research, my top candidate for the cause of this bug is this:

https://www.virtualbox.org/ticket/4870

It's an old ticket, but matches what we seem to be experiencing to a T.

jdunck commented 12 years ago

Supposedly fixed in 3.0.8 and I'm running 4.1.4. They closed the ticket because nobody responded to the most recent request for info, but it seems unlikely it was actually fixed. Hmmph.

dcarley commented 12 years ago

At last, I found other people with the same issue!

I too hit this with a large pip install. My suspicion is that the quantity of outbound connections chokes VB's NAT connection tracking. The only other way I've reproduced it is by really hammering DNS lookups. If the OP's provisioner is pulling in lots of packages from external sources then it could produce the same effect.

I gave up trying to register for an "Oracle account" and just put findings in a Gist. I've got some host and guest network captures kicking around too.

https://gist.github.com/1716454

mitchellh commented 12 years ago

Ah, this is super helpful!I think in the future the best thing would be to allow non-NAT interfaces to SSH.

jdunck commented 12 years ago

@mitchellh If I understand correctly, the NAT interface is always set up for vagrant to do the ssh into, and any calls to config.vm.network make additional interfaces. And since the NAT interface is eth0, it's being used for the provisioning.

Would it be reasonable as a short-term workaround to have the NAT interface been the last interface rather that the first one?

dcarley commented 12 years ago

You can try working around it with a different adapter type. I've had some success with:

config.vm.customize(["modifyvm", :id, "--nictype1", "Am79C973"])

kumar303 commented 12 years ago

Is everyone who is having trouble here using ubuntu? I am.

dickbrouwer commented 12 years ago

Me too

Sent from my iPhone

On Feb 16, 2012, at 10:26, Kumar McMillan reply@reply.github.com wrote:

Is everyone who is having trouble here using ubuntu? I am.

Reply to this email directly or view it on GitHub: https://github.com/mitchellh/vagrant/issues/516#issuecomment-4006266

kumar303 commented 12 years ago

Actually, heh, a better question is: Does anyone hit the symptoms of this bug in a non-ubuntu VM? In talking with others I found someone with a Cent OS vm who has never seen this.

LarsFronius commented 12 years ago

Yeah, the actual bug report I was coming up with was made, whilst provisioning a Debian VM.

dcarley commented 12 years ago

CentOS guests on Ubuntu hosts, here.

mattd commented 12 years ago

FWIW, dcarley's solution mentioned above worked for me on an Ubuntu Lucid guest box.

Why? No idea. :)

jdunck commented 12 years ago

@mattd @dcarley Where exactly are you putting the line?

I'm assuming it must be within Vagrant::Config.run do |config| / end

but no matter where I put it in my Vagrantfile, vagrant up hangs trying to talk to the VM:

[default] Booting VM... [default] Waiting for VM to boot. This can take a few minutes.

dcarley commented 12 years ago

Looks like a lot of DNS requests alone are enough to reproduce it.

I've created a Vagrant environment to reproduce it. There's also an example of how modifyvm is being used:

https://github.com/dcarley/vbox_natbug

xuru commented 12 years ago

@dcarley Great information. Thanks for putting this together!

EslamElHusseiny commented 12 years ago

any updates, i updated my vagrant version to 1.0.3 but error "/var/lib/gems/1.8/gems/net-ssh-2.2.2/lib/net/ssh/ruby_compat.rb:33:in `select': closed stream (IOError) " still appears in step of install pips

bgreenlee commented 12 years ago

I'm seeing the same issue as @EslamElHusseiny. I get to the point where I install my pip-requirements via pip install -r and it dies. In my case: /Users/brad/.rvm/gems/ruby-1.9.3-p125/gems/net-ssh-2.2.2/lib/net/ssh/ruby_compat.rb:22:inselect': closed stream (IOError)`

I'm running vagrant 1.0.3 and VirtualBox 4.1.14

Update: I have the nictype fix (config.vm.customize(["modifyvm", :id, "--nictype1", "Am79C973"]), and if I go in through the host-only networking IP, it takes longer to fail, but still fails eventually.

Update 2: Not sure of the relevance, but it always seems to fail while installing python-memcached via pip. Seems related to https://github.com/pypa/pip/issues/205

skottler commented 12 years ago

I can reproduce this bug over and over. It also happens to me when installing python-memcached and the workaround does not help mitigate the effects of the issue.

mikeywaites commented 12 years ago

@bgreenlee im seeing the exact same problem and always as it tries to install python-memcache. did you find a fix?

bgreenlee commented 12 years ago

@mikeywaites My workaround was to download python-memcached manually and run pip install directly on the tar file.

jtopper commented 12 years ago

I've seen this happening on a CentOS 6 box, running a sizeable puppet run from a shell provisioner. Prior to the stacktrace being thrown, the output from my "vagrant up" run pauses, and a shell opened using "vagrant ssh" also gets stuck, and then usually thrown out with a "connection closed by remote host".

My VM has a host-only network interface configured, and a straight ssh connection to that address continues to work without trouble throughout.

I'm happy to provide you with a copy of my setup if that would help.

jtopper commented 12 years ago

This isn't happening repeatably enough for me to be able to have a good run at debugging, but on the occasions when it happens, I'm gathering what data I can. This morning, my "vagrant ssh" initiated sessions all got stuck whilst doing a "vagrant provision". I was able to get in via ssh to my host-only network interface and attach strace to the sshd process. I saw this when attempting to make a new "vagrant ssh" initiated connection:

[pid  5586] getsockopt(3, SOL_IP, IP_OPTIONS, "", [0]) = 0
[pid  5586] getpeername(3, {sa_family=AF_INET, sin_port=htons(54185), sin_addr=inet_addr("10.0.2.2")}, [16]) = 0
[pid  5586] getpeername(3, {sa_family=AF_INET, sin_port=htons(54185), sin_addr=inet_addr("10.0.2.2")}, [16]) = 0
[pid  5586] getsockname(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.0.2.15")}, [16]) = 0
[pid  5586] open("/etc/hosts.allow", O_RDONLY) = 4
[pid  5586] fstat(4, {st_mode=S_IFREG|0644, st_size=370, ...}) = 0
[pid  5586] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdfe8965000
[pid  5586] read(4, "#\n# hosts.allow\tThis file contai"..., 4096) = 370
[pid  5586] read(4, "", 4096)           = 0
[pid  5586] close(4)                    = 0
[pid  5586] munmap(0x7fdfe8965000, 4096) = 0
[pid  5586] open("/etc/hosts.deny", O_RDONLY) = 4
[pid  5586] fstat(4, {st_mode=S_IFREG|0644, st_size=460, ...}) = 0
[pid  5586] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdfe8965000
[pid  5586] read(4, "#\n# hosts.deny\tThis file contain"..., 4096) = 460
[pid  5586] read(4, "", 4096)           = 0
[pid  5586] close(4)                    = 0
[pid  5586] munmap(0x7fdfe8965000, 4096) = 0
[pid  5586] rt_sigaction(SIGALRM, NULL, {SIG_DFL, [], 0}, 8) = 0
[pid  5586] rt_sigaction(SIGALRM, {0x7fdfe8977300, [], SA_RESTORER|SA_INTERRUPT, 0x7fdfe5cea900}, NULL, 8) = 0
[pid  5586] alarm(120)                  = 0
[pid  5586] write(3, "SSH-2.0-OpenSSH_5.3\r\n", 21) = 21
[pid  5586] read(3,

The ssh connection hits sshd, and then stalls, with sshd waiting on communication from the client. This happens whether I use "vagrant ssh" or use ssh directly from the commandline, and only on the NATed ssh port.

I did have

ControlMaster auto
ControlPath /tmp/ssh_mux_%h_%p_%r

in my ssh client config, but removing this didn't make a difference.

I have my suspicions that that is a VirtualBox bug, but I'm not sure how I'd rule that out.

dcarley commented 12 years ago

I'm certain that it is a VB bug. Assuming that all of these issues are indeed the very same thing. I could reproduce the hang of a NAT interface without any Vagrant in the picture.

@jtopper Does your Puppet run hammer the network interface a fair amount?

jtopper commented 12 years ago

@dcarley It's relatively heavy, yeah, lots of RPM downloads and such.

leth commented 12 years ago

My workaround for this is to double ctrl-c vagrant as soon as it get into the provisioning step of the up command, and then run the provisioning step manually, that way any failures don't trigger the VM deletion.

@mitchellh: Perhaps the problem could be neutralised slightly by reworking the exception handling on the up command. I can see why it makes sense to destroy the VM if the box import etc goes awry, however I don't think an error during the provisioning step of up should necessarily cause the destruction of the VM.

hashicorp / vagrant

random "Forcing shutdown of VM" through provisioning process #516