Closed LarsFronius closed 11 years ago
I run into the same error, but with me it always happens during a long running process (in my case running pip install requirements on a guest).
[default] [Wed, 19 Oct 2011 05:56:39 -0700] INFO: execute[install requirements] sh(. /home/vagrant/dev-env/bin/activate && pip install -r /vagrant/requirements.txt)
: stdout
[default] [Wed, 19 Oct 2011 05:58:41 -0700] ERROR: execute[install requirements] (main::dev line 8) has had an error
: stdout
[default] Forcing shutdown of VM...
[default] Destroying VM and associated drives...
/usr/lib/ruby/gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:33:in `select': closed stream (IOError)
I can re-run it without making any changes and it'll run just fine. Appears to be completely random. I'm wondering if it has something to do with available memory or network problems.
FWIW I ran into the same issue as well on long running actions. In my case it is running chef. I as well can rerun and it will continue
[default] [Wed, 19 Oct 2011 17:02:41 +0200] INFO: Storing updated cookbooks/apache2/recipes/mod_proxy.rb in the cache.
: stdout
[default] [Wed, 19 Oct 2011 17:03:01 +0200] ERROR: Running exception handlers
: stdout
[default] Forcing shutdown of VM...
[default] Destroying VM and associated drives...
/Library/Ruby/Gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:33:in `select': closed stream (IOError)
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:33:in `io_select'
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:32:in `synchronize'
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:32:in `io_select'
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/transport/packet_stream.rb:73:in `available_for_read?'
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/transport/packet_stream.rb:85:in `next_packet'
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/transport/session.rb:169:in `poll_message'
from /Library/Ruby/Gems/1.8/gems/net-ssh-2.1.4/lib/net/ssh/transport/session.rb:164:in `loop'
I also have modified my ssh parameters to be as follows with no benefit:
config.ssh.max_tries = 100
config.ssh.timeout = 600
I'm seeing this with Chef as well:
...
[default] [Thu, 27 Oct 2011 09:26:03 +0000] INFO: execute[upgrade-pear-with-pear] ran successfully
: stdout
[default] Forcing shutdown of VM...
[default] Destroying VM and associated drives...
[default] Destroying unused networking interface...
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:22:in `select': closed stream (IOError)
...
Everything seems to be going well, then it stops and tears down the VM.
I've seen this as well. Not sure yet how to work around this but marking as a bug.
Looks like I can do client side keep-alive on connections that take a long time: http://net-ssh.github.com/net-ssh/classes/Net/SSH/Connection/Session.html#M000091
Unfortunately I don't think this will work in this case because we're waiting on a command to finish... Hm.
Also looks like there is this keep-alive setting, but I'm not sure if I can set it for net-ssh: http://www.fettesps.com/enable-ssh-keep-alive/
Same here:
[default] Forcing shutdown of VM... [default] Destroying VM and associated drives... /Users/dikbrouwer/.rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:22:in
select': closed stream (IOError)
from /Users/dikbrouwer/.rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.1.4/lib/net/ssh/ruby_compat.rb:22:in io_select' from /Users/dikbrouwer/.rvm/gems/ruby-1.9.2-p290/gems/net-ssh-2.1.4/lib/net/ssh/transport/packet_stream.rb:73:in
available_for_read?`
Any suggestions for a temporary workaround? I'm unable to provision the VM at all as it happens every time for me (Ubuntu 11.10 if that matters, I thought I wasn't seeing it before).
same here. This is a blocker...
Can we disable the forced shutdown and destroy during the provision step? Where would I look in the code to do this?
Interesting. Can anyone else that can consistently cause this problem reproduce @chalfant's results?
I can, it happens on every run (with ruby 1.9.2, where did Gregor's comment re: reuby 1.8.7+ go?)
On Thu, Jan 5, 2012 at 10:34 PM, Mitchell Hashimoto < reply@reply.github.com
wrote:
Interesting. Can anyone else that can consistently cause this problem reproduce @chalfant's results?
Reply to this email directly or view it on GitHub: https://github.com/mitchellh/vagrant/issues/516#issuecomment-3376704
@mitchellh Was that with reference to the comment about this issue being related to ruby 1.8.7 (which now appears to be removed)?
I'm seeing this with ruby version "1.9.2p290 (2011-07-09 revision 32553) [x86_64-darwin10.8.0]" so not a Ruby 1.8.7 problem as far as I can tell.
Ah, it must have been deleted. :)
Yeah I figured this would affect every version. Sigh. I'm still not sure how to fix this since it appears to be a Net::SSH bug but Net::SSH is in maintenance mode and only accepting pull requests, and I'm not sure where to begin in there. :-\ I'll continue thinking.
yes, I was too slow in deleting. I noticed I did some changes that would cause my command generate some output from time to time. In this case, the error does not happen, though output is only shown in the end.
So I think I may have a way to fix this. Can anyone post a Vagrantfile/puppet recipe that can consistently show this bug? I'm having trouble repo-ing it.
I did not try this out, but something like
execute "sleep" do command "sleep 3600" end
should do it.
@mitchellh We can reproduce often (although not consistently) but it's on a chef provision that connects to our chef server. It usually fails when trying to install mysql via apt-get on Ubuntu 10.04.
For me, compiling node.js often fails (although it's part of a much larger recipe). A large pip_install requirements.txt will also do it.
On Jan 9, 2012, at 2:47 PM, Shaun reply@reply.github.com wrote:
@mitchellh We can reproduce often (although not consistently) but it's on a chef provision that connects to our chef server. It usually fails when trying to install mysql via apt-get on Ubuntu 10.04.
Reply to this email directly or view it on GitHub: https://github.com/mitchellh/vagrant/issues/516#issuecomment-3421884
This reproduces it every single time for me on Ruby 1.8.7 and vagrant 0.8.10 (I also tried with vagrant 0.9.2 and I think it happens there too).
git clone -n git://github.com/mozilla/zamboni.git
cd zamboni
git checkout 0bbef8533575875e7240c142957e8d09a797ee26
git submodule update --init --recursive
vagrant up
I am desperately trying to find a workaround for this :( As others have commented, the exception is triggered during pip install -r requirements/compiled.txt
which often takes a long time.
We've worked around this by not relying on vagrant up to perform provisioning.
After vagrant up
has created the box and right as it starts to provision our box with chef-client, I use ctrl+c to bail out of vagrant up. Then I run vagrant ssh
to get into the box and run chef-client manually from there. This circumvents the troublesome timeout in our case since chef is not relying on vagrant to establish connections anymore.
Ugh. This is pretty hard to work around. I packaged up my provisioned machine which cut down on long running commands but we need our users to periodically run database migrations via puppet. These can run long.
Any ideas on how to fix the SSH timeout?
At the very least is there a way to rescue the IOError so that the command can fail without causing the VM to shut down?
Here are some more clues. I think it's more than just an SSH timeout. It appears that the VM is actually losing all connectivity to the network.
To workaround the problem I installed screen and set up /home/vagrant/.profile so it starts a screen session when you log in and runs a script to download everything it needs when you SSH in. I see the same kind of error where it says
Connection to 127.0.0.1 closed by remote host.
When I ssh back in to resume the screen session, I see an exception in my Python script (that was running) showing that it lost connectivity. This has happened several times now.
If this were merely some kind of SSH timeout then in theory the screen session would have continued running in the background. Also, I went nuts and reconfigured /etc/ssh/sshd_config so that it could not possibly timeout. That did not have any affect.
I can reproduce this consistently. How can I help debug the source of the problem? I'm on IRC, freenode and irc.mozilla.org as kumar
Possibly related: https://www.virtualbox.org/ticket/6559
Though I'll note I'm using VirtualBox 4.1.4 and am still having this issue.
I'm w/ @kumar303 in that it doesn't seem to be just an ssh issue.
I checked ulimit -a and noted that it had 1024 files-per-proc so thought that might be being hit - no, it doesn't appear so, nor OOM. Quite frustrating :-/
Is there any update on this issue? I'm seeing the same thing running a chef_client provision. I lose connection right around the time that my cookbooks have finished downloading.
After a bunch of research, my top candidate for the cause of this bug is this:
https://www.virtualbox.org/ticket/4870
It's an old ticket, but matches what we seem to be experiencing to a T.
Supposedly fixed in 3.0.8 and I'm running 4.1.4. They closed the ticket because nobody responded to the most recent request for info, but it seems unlikely it was actually fixed. Hmmph.
At last, I found other people with the same issue!
I too hit this with a large pip install
. My suspicion is that the quantity of outbound connections chokes VB's NAT connection tracking. The only other way I've reproduced it is by really hammering DNS lookups. If the OP's provisioner is pulling in lots of packages from external sources then it could produce the same effect.
I gave up trying to register for an "Oracle account" and just put findings in a Gist. I've got some host and guest network captures kicking around too.
Ah, this is super helpful!I think in the future the best thing would be to allow non-NAT interfaces to SSH.
@mitchellh If I understand correctly, the NAT interface is always set up for vagrant to do the ssh into, and any calls to config.vm.network make additional interfaces. And since the NAT interface is eth0, it's being used for the provisioning.
Would it be reasonable as a short-term workaround to have the NAT interface been the last interface rather that the first one?
You can try working around it with a different adapter type. I've had some success with:
config.vm.customize(["modifyvm", :id, "--nictype1", "Am79C973"])
Is everyone who is having trouble here using ubuntu? I am.
Me too
Sent from my iPhone
On Feb 16, 2012, at 10:26, Kumar McMillan reply@reply.github.com wrote:
Is everyone who is having trouble here using ubuntu? I am.
Reply to this email directly or view it on GitHub: https://github.com/mitchellh/vagrant/issues/516#issuecomment-4006266
Actually, heh, a better question is: Does anyone hit the symptoms of this bug in a non-ubuntu VM? In talking with others I found someone with a Cent OS vm who has never seen this.
Yeah, the actual bug report I was coming up with was made, whilst provisioning a Debian VM.
CentOS guests on Ubuntu hosts, here.
FWIW, dcarley's solution mentioned above worked for me on an Ubuntu Lucid guest box.
Why? No idea. :)
@mattd @dcarley Where exactly are you putting the line?
I'm assuming it must be within Vagrant::Config.run do |config| / end
but no matter where I put it in my Vagrantfile, vagrant up hangs trying to talk to the VM:
[default] Booting VM... [default] Waiting for VM to boot. This can take a few minutes.
Looks like a lot of DNS requests alone are enough to reproduce it.
I've created a Vagrant environment to reproduce it. There's also an example of how modifyvm
is being used:
@dcarley Great information. Thanks for putting this together!
any updates, i updated my vagrant version to 1.0.3 but error "/var/lib/gems/1.8/gems/net-ssh-2.2.2/lib/net/ssh/ruby_compat.rb:33:in `select': closed stream (IOError) " still appears in step of install pips
I'm seeing the same issue as @EslamElHusseiny. I get to the point where I install my pip-requirements via pip install -r
and it dies. In my case: /Users/brad/.rvm/gems/ruby-1.9.3-p125/gems/net-ssh-2.2.2/lib/net/ssh/ruby_compat.rb:22:in
select': closed stream (IOError)`
I'm running vagrant 1.0.3 and VirtualBox 4.1.14
Update: I have the nictype fix (config.vm.customize(["modifyvm", :id, "--nictype1", "Am79C973"])
, and if I go in through the host-only networking IP, it takes longer to fail, but still fails eventually.
Update 2: Not sure of the relevance, but it always seems to fail while installing python-memcached
via pip. Seems related to https://github.com/pypa/pip/issues/205
I can reproduce this bug over and over. It also happens to me when installing python-memcached and the workaround does not help mitigate the effects of the issue.
@bgreenlee im seeing the exact same problem and always as it tries to install python-memcache. did you find a fix?
@mikeywaites My workaround was to download python-memcached manually and run pip install directly on the tar file.
I've seen this happening on a CentOS 6 box, running a sizeable puppet run from a shell provisioner. Prior to the stacktrace being thrown, the output from my "vagrant up" run pauses, and a shell opened using "vagrant ssh" also gets stuck, and then usually thrown out with a "connection closed by remote host".
My VM has a host-only network interface configured, and a straight ssh connection to that address continues to work without trouble throughout.
I'm happy to provide you with a copy of my setup if that would help.
This isn't happening repeatably enough for me to be able to have a good run at debugging, but on the occasions when it happens, I'm gathering what data I can. This morning, my "vagrant ssh" initiated sessions all got stuck whilst doing a "vagrant provision". I was able to get in via ssh to my host-only network interface and attach strace to the sshd process. I saw this when attempting to make a new "vagrant ssh" initiated connection:
[pid 5586] getsockopt(3, SOL_IP, IP_OPTIONS, "", [0]) = 0
[pid 5586] getpeername(3, {sa_family=AF_INET, sin_port=htons(54185), sin_addr=inet_addr("10.0.2.2")}, [16]) = 0
[pid 5586] getpeername(3, {sa_family=AF_INET, sin_port=htons(54185), sin_addr=inet_addr("10.0.2.2")}, [16]) = 0
[pid 5586] getsockname(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.0.2.15")}, [16]) = 0
[pid 5586] open("/etc/hosts.allow", O_RDONLY) = 4
[pid 5586] fstat(4, {st_mode=S_IFREG|0644, st_size=370, ...}) = 0
[pid 5586] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdfe8965000
[pid 5586] read(4, "#\n# hosts.allow\tThis file contai"..., 4096) = 370
[pid 5586] read(4, "", 4096) = 0
[pid 5586] close(4) = 0
[pid 5586] munmap(0x7fdfe8965000, 4096) = 0
[pid 5586] open("/etc/hosts.deny", O_RDONLY) = 4
[pid 5586] fstat(4, {st_mode=S_IFREG|0644, st_size=460, ...}) = 0
[pid 5586] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdfe8965000
[pid 5586] read(4, "#\n# hosts.deny\tThis file contain"..., 4096) = 460
[pid 5586] read(4, "", 4096) = 0
[pid 5586] close(4) = 0
[pid 5586] munmap(0x7fdfe8965000, 4096) = 0
[pid 5586] rt_sigaction(SIGALRM, NULL, {SIG_DFL, [], 0}, 8) = 0
[pid 5586] rt_sigaction(SIGALRM, {0x7fdfe8977300, [], SA_RESTORER|SA_INTERRUPT, 0x7fdfe5cea900}, NULL, 8) = 0
[pid 5586] alarm(120) = 0
[pid 5586] write(3, "SSH-2.0-OpenSSH_5.3\r\n", 21) = 21
[pid 5586] read(3,
The ssh connection hits sshd, and then stalls, with sshd waiting on communication from the client. This happens whether I use "vagrant ssh" or use ssh directly from the commandline, and only on the NATed ssh port.
I did have
ControlMaster auto
ControlPath /tmp/ssh_mux_%h_%p_%r
in my ssh client config, but removing this didn't make a difference.
I have my suspicions that that is a VirtualBox bug, but I'm not sure how I'd rule that out.
I'm certain that it is a VB bug. Assuming that all of these issues are indeed the very same thing. I could reproduce the hang of a NAT interface without any Vagrant in the picture.
@jtopper Does your Puppet run hammer the network interface a fair amount?
@dcarley It's relatively heavy, yeah, lots of RPM downloads and such.
My workaround for this is to double ctrl-c vagrant as soon as it get into the provisioning step of the up
command, and then run the provisioning step manually, that way any failures don't trigger the VM deletion.
@mitchellh: Perhaps the problem could be neutralised slightly by reworking the exception handling on the up
command. I can see why it makes sense to destroy the VM if the box import etc goes awry, however I don't think an error during the provisioning step of up
should necessarily cause the destruction of the VM.
If I run "vagrant up" and have a puppet provisioner configured, the provisioning process sometimes gets abborted by the following message. I think this happens more often if I run puppet in the foreground - being done with puppet.options = ["--test"] in the Vagrantfile.
I think this doesn't fail when I only run 'vagrant provision' after I created a VM without a provisioner configured in the Vagrantfile when created with 'vagrant up'.
I run ruby 1.9.2 with vagrant 0.8.7 on a Mac OS 10.7.1 in iTerm 2 - if this is relevant for the "closed stream (IOError)" in the first ruby error line. But I have also heard from a colleague who runs Ubuntu, that he has the same issue. He uses puppetfiles from his local machine and I use a remote puppet server. From that point of view I can say that I hardly believe this is a general problem, rather than a Mac OS or iTerm specific or a problem with that I use a remote puppet server.
This is my Vagrantfile btw.
Vagrant::Config.run do |config| config.vm.define :avm01 do |config| config.vm.box = "squeeze-pxe-vbox4.1.4-v3" config.vm.network "33.33.33.10" config.vm.customize do |vm| vm.memory_size = 1024 end end config.vm.provision :puppet_server do |puppet| puppet.puppet_server = "puppet.fq.dn" puppet.puppet_node = "avm01.vagrant.internal" puppet.options = ["--test"] end end