vagrant ssh only possible after restart network

sebastian-alfers commented 13 years ago

Hey,

i can not log into my VM after "vagrant up"

i have to start it in gui-mode, then retart my network adapter "sudo /etc/init.d/networking restart" after this, my VM gets an ip (v4) address and my mac is able to ssh the VM and do the provisioning.

any idea on this?

same isse as here: http://groups.google.com/group/vagrant-up/browse_frm/thread/e951417f59e74b9c

the box is about 5 days old!

Thank you! Seb

mitchellh commented 13 years ago

Ah, so we tried to fix this in the thread. I'm not entirely sure what the cause of this is, although it has something to do with the setup of the box. I've put a sleep in the bootup process. Please verify you have a pre-up sleep 2 in your /etc/network/interfaces file.

Otherwise, any other hints would be helpful :-\

Benedict commented 13 years ago

I too am having this problem. I've tried both lucid32 & lucid64, which I downloaded today.

Before running sudo /etc/init.d/networking restart the /etc/network/interfaces looks like

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet dhcp
pre-up sleep 2

Afterward restarting the networking and running vagrant reload, the file looks like

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet dhcp
pre-up sleep 2
#VAGRANT-BEGIN
# The contents below are automatically generated by Vagrant.
# Please do not modify any of these contents.
auto eth1
iface eth1 inet static
      address 33.33.33.10
      netmask 255.255.255.0
#VAGRANT-END

Any ideas?

hedgehog commented 13 years ago

ssh doesn't like two hosts at the one address. I've seen this with two VM's getting the same address and SSH showing the same behavior (below).

Now it turns out SSH doesn't like two redirected port connections to the same port.

Symptom:

$ ssh vagrant@127.0.0.1 -p 2222 -i /path/to/private/key/vagrant -vvv
OpenSSH_5.3p1 Debian-3ubuntu7, OpenSSL 0.9.8k 25 Mar 2009
debug1: Reading configuration data /home/hedge/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to 127.0.0.1 [127.0.0.1] port 2222.
debug1: Connection established.
debug3: Not a RSA1 key file /path/to/private/key/vagrant.
debug2: key_type_from_name: unknown key type '-----BEGIN'
debug3: key_read: missing keytype
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug2: key_type_from_name: unknown key type '-----END'
debug3: key_read: missing keytype
debug1: identity file /path/to/private/key/vagrant type 1
debug1: Checking blacklist file /usr/share/ssh/blacklist.RSA-2048
debug1: Checking blacklist file /etc/ssh/blacklist.RSA-2048
^C

Now I see two connections to 127.0.0.1:222

$ lsof -i :2222
ruby    9851 hedge   12u  IPv4 13467394      0t0  TCP localhost:55035->localhost:2222 (ESTABLISHED)
ruby    9851 hedge   13u  IPv4 13469354      0t0  TCP localhost:55098->localhost:2222 (ESTABLISHED)

Confirm that this is vagrant:

$ ps uax|grep 9851
hedge     9851  6.4  0.2 256080 47836 pts/4    Sl+  12:38   0:16 ruby /home/hedge/.rvm/gems/ruby-1.9.2-p180@thinit/bin/vagrant up

Confirm there is only one vm running:

$ ps aux|grep startvm
hedge     9873  4.9  2.6 706800 441432 ?       Sl   12:39   0:29 /usr/lib/virtualbox/VBoxHeadless --comment www --startvm 82cb3255-940b-48f6-b2c7-8ec50ae6500d --vrde config

So it seems the problem is that somewhere in vagrant two connections are being established to port 2222.

Correct?

judev commented 13 years ago

Could this be some sort of timing issue with the linux networking trying to start (or get an IP) before Virtualbox has finished setting up the interface? Must admit that I don't know the internals so not sure if this is even likely. When I enable the Virtualbox GUI and login (while vagrant is still trying to connect via ssh), ifconfig reports no IPv4 address. If I then run sudo dhclient vagrant successfully connects within a couple of seconds.

mitchellh commented 13 years ago

@judev

If this was the case then switching VirtualBox versions back would fix the issue, which I'm not sure is the case (it may be, I don't know). I say this because previous versions of Vagrant worked just fine. This is still an isolated issue but annoying enough that I'd like to really figure it out, but haven't been able to yet.

hedgehog commented 13 years ago

@mitchellh, in my case switching VB back to 4.0.4 seems to have eliminated the issue. VB 4.0.10 was a problem. From memory I upgraded from 4.0.6 because I was hitting some issues. At the time I had 4.06 I wasn't using vagrant much.

Anyway, stepping back to VB 4.0.4 is definitely a fix for this issue in my case. We also can't rule out the Host OS. I say this simply because the packaged OSE versions of VB on lucid seem to be 4.0.4.

hedgehog commented 13 years ago

@judev, what happens if you vagrant reload that VM after you have connected to it via ssh? Are you able to ssh to it again? Run the lsof -i 2222 and note the connection details of your established ssh connection. In my case I'd see two established connections to localhost:2222 after the reload, one of them being the connection from before the reload.

hedgehog commented 13 years ago

@judev, please add your failing and passing configuration details to this page: https://github.com/jedi4ever/veewee/wiki/vagrant-(veewee)-+-virtualbox-versions-test-matrix

The page has an example script that makes it easy to test (change the Ruby and gem versions to what you have). It shouldn't pollute your system if you have rvm installed.

judev commented 13 years ago

sorry for the delay - I've tried with each version of VirtualBox from 4.0.4 to 4.0.10, same problem when using the latest lucid32 box, but everything works fine using "ubuntu 11.04 server i386" from http://vagrantbox.es

@hedgehog, when I did sudo dhclient, connected over ssh, then did vagrant reload I still could not connect until doing another sudo dhclient. The previous connection did not show using lsof

Thanks for your help, am happy to say things are working really well with ubuntu 11.04.

hedgehog commented 13 years ago

@judev, Do I understand correctly: lsof -i :2222 returned nothing after vagrant reload, then there was one connection after running sudo dhclient?
Or: Does lsof -i :2222 show two connections after vagrant reload, and this then falls to one connection after sudo client. Might help if you gave the actual commands and their outputs.

mabroor commented 13 years ago

I get the same issue.. latest version of vagrant, vbox on win7 x64 using jruby (as mentioned in the docs). Running sudo dhclient on the gui was able to get my puppet manifest running. Strange thing is that I had another machine with the exact same setup where I encountered this issue only one i the last week. This machine has this problem constantly...

hedgehog commented 13 years ago

@mabroor could you give the additional cmd output, in sequence, requested above?

mabroor commented 13 years ago

@hedgehog

I tried after a vagrant halt Problem returns.. below is teh output from netstat while vagrant is waiting for the vbox to boot (it is already booted)

netstat -an
 TCP    0.0.0.0:2222           0.0.0.0:0              LISTENING
 TCP    127.0.0.1:2222         127.0.0.1:54436        TIME_WAIT
 TCP    127.0.0.1:2222         127.0.0.1:54612        FIN_WAIT_2
 TCP    127.0.0.1:2222         127.0.0.1:54618        ESTABLISHED
 TCP    127.0.0.1:2222         127.0.0.1:54624        ESTABLISHED```

I then login to the vbox and run ```sudo dhclient``` and it works fine..  When vagrant has done its thing.. so connections are shown established using netstat. I am using windows so can't use the native ssh to show verbose output.

grimen commented 13 years ago

Same issue but @sudo /etc/init.d/networking restart@ didn't solve it for me. I'm trying another box now, let's hop it works.

mabroor commented 13 years ago

@grimen: try sudo dhclient Always works for me now.

hedgehog commented 13 years ago

@mabroor, is it the case that, according to netstat, there are always two est connections when you cannot connect and only one when you can connect?

mabroor commented 13 years ago

That's correct. On Jul 29, 2011 10:28 AM, "hedgehog" < reply@reply.github.com> wrote:

@mabroor, is it the case that, according to netstat, there are always two est connections when you cannot connect and only one when you can connect?

Reply to this email directly or view it on GitHub: https://github.com/mitchellh/vagrant/issues/391#issuecomment-1679648

grimen commented 13 years ago

@mabroor Do you maybe know the OS X corresponding solution?

mabroor commented 13 years ago

@grimen the command I mentioned has to be run in the vm. I didn't know the problem existed in OSX, I had the issue on Windows 7 x64.

grimen commented 13 years ago

@mabroor Ouch, yes of course then it even makes sense. :) Problem though is that I cannot get into the vm - how did u do that?

mabroor commented 13 years ago

config.vm.boot_mode = :gui in your vagrantfile to run the vm in gui mode.

grimen commented 13 years ago

@mabroor Thanks - will try that!

grimen commented 13 years ago

I got the GUI now but none of the proposals in this thread works for me (for "lucid32" and "lucid64" that is - those seems to be flawed as 'talifun' works). :(

mrolli commented 13 years ago

My combo shows the same issue: Mac OS X 10.7.1, Vagrant 0.8.5, Virtualbox 4.1.0, lucid64 with correct guest additions

After first boot vagrant could not connect to vm. In vm (GUI) there was no IP-address set. Did a sudo dhclient while vagrant was hanging and vagrant connected instantly after the guest finally had an IP.

Meanwhile I did vagrant reload twice and never had to do a sudo dhclient.

vasko commented 13 years ago

I'm using Mac OS X 10.7.1, Vagrant 0.8.6, VirtualBox 4.1.2, lucid32 with the 4.1.0 guest additions.

I've added the following line to my Vagrant::Config and it boots up and works fine now. config.vm.provision :shell, :inline => "/etc/init.d/networking restart"

It's not the ideal situation, but it works without needing to go into the GUI.

UPDATE: Okay. I've run this a few times and it doesn't always work. Especially when I'm connected to the internal network without an internet connection it seems.

mikhailov commented 13 years ago

that works for me:

1) login with :gui by login/pass: vagrant/vagrant
2) modify the “/etc/rc.local” file 
to include the line “sh /etc/init.d/networking restart” just before “exit 0″.
3) disable :gui
4) vagrant reload

shingara commented 13 years ago

There are no technic without hacking on gui mode ?

vasko commented 13 years ago

I've repeated the below process at least 5 times now for all scenarios.

Running vagrant up after I've started the VirtualBox application works every time.

Running vagrant up without starting the VirtualBOX application fails every time, with or without the ":gui" option.

From my simple testing it seems to be an issue with running headless.

UPDATE: I've just found this article http://serverfault.com/questions/91665/virtualbox-headless-server-on-ubuntu-missing-vrdp-options. I've just installed the Extensions pack and I've had no issues since. VRDP was removed from VirtualBox 4.0 and moved into the extension pack. I believe this might also be related to this issue https://github.com/mitchellh/vagrant/issues/455.

UPDATE: I jumped the gun on this I think. I'm having trouble with lucid32 and lucid64 running without the ":gui" option.

hedgehog commented 13 years ago

Can people with this issue confirm that the following pull request fixes this issue for them?

https://github.com/mitchellh/vagrant/pull/534

ku1ik commented 13 years ago

Hey @hedgehog. I've just tried your fork and it didn't solve the issue for me unfortunately :/

hedgehog commented 13 years ago

@sickill, thanks. I think the changes are useful in speeding up the ssh connections, but they also exposed what I think is the real cause, and that is Net::SSH. I'm not sure if the problem is with Net::SSH perse, or just how it is used. Still working on a fix....

hedgehog commented 13 years ago

By replacing Net::SSH.start(...) I was able to determine that the likely ssh error is Connection timed out during banner exchange, and occurs after the connection is established (note the timeout is set in the ssh cmd):

<snip>
debug2: ssh_connect: needpriv 0
debug1: Connecting to 127.0.0.1 [127.0.0.1] port 2206.
debug2: fd 3 setting O_NONBLOCK
debug1: fd 3 clearing O_NONBLOCK
debug1: Connection established.
debug3: timeout: 1000 ms remain after connect
debug3: Not a RSA1 key file /home/hedge/.rvm/gems/ruby-1.9.2-p290@thvirt/gems/vagrant-0.8.7/keys/vagrant.
<snip>
debug1: identity file /home/hedge/.rvm/gems/ruby-1.9.2-p290@thvirt/gems/vagrant-0.8.7/keys/vagrant type 1
debug1: Checking blacklist file /usr/share/ssh/blacklist.RSA-2048
debug1: Checking blacklist file /etc/ssh/blacklist.RSA-2048
Connection timed out during banner exchange

Can anyone confirm this by running (assuming a blocked VM):

In a bash shell running (setting 1 sec timeout):

ssh -p 2206 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o IdentitiesOnly=yes -i /home/hedge/.rvm/gems/ruby-1.9.2-p290@thvirt/gems/vagrant-0.8.7/keys/vagrant -o ControlMaster=auto -o ControlPath=~/.ssh/vagrant-multiplex-%r@%h:%p -o ConnectTimeout=1 -vvvv vagrant@127.0.0.1

Possibly related Issues: chromium-os issue 20514 chromium-os issue 21739

hedgehog commented 12 years ago

Debian/Ubuntu users:

Can you try rebuilding your boxes with this workaround: https://github.com/jedi4ever/veewee/issues/159

Please report in the veewee issue if this:

resolves the issue as far as you can tell (I had a reload loop succeed 101 times)
only reduces the severity of the issue

Non Debian/Ubuntu users, there is likely a similar facility to make these changes before the first reboot in veewee's build process.

mitchellh commented 12 years ago

I'm going to go ahead and close this issue because while it is a bug with Vagrant, it is really more of a bug with net-ssh and not being robust enough to handle various edge cases of SSH. I don't see any clear way at the moment to fix this bug (which is very frustrating), but if I do I will fix it ASAP.

catditch commented 12 years ago

How about printing a warning on Vagrant startup when using headless mode?

Burgestrand commented 12 years ago

Which boxes are people using? I could not get any of the ubuntu boxes past this issue, but I tried an archlinux box (from vagrantbox.es) instead and it works flawlessly (so far!).

msabramo commented 12 years ago

Another possible cause of this issue (which I just ran into): if /Applications is world-writable then VirtualBox will refuse to start the vm apparently.

ramonvanalteren commented 12 years ago

FTR I had similar problems on Mac OSX 10.7 with vagrant 1.0.2 and Virtualbox 4.1.8r75467 and a debian squeeze based box from http://puppetlabs.s3.amazonaws.com/pub/squeeze64.box

It turns out that all the connection issues in my case directly had to do with being in a bridged network setup. The bridged setup will do two interfaces eth0 on internal range (10.0.2.2 by convention I think) and eth1 which will get an ipaddress from the bridged network.

For reasons unclear to me in some cases eth1 will come up with a different macaddress causing the udev rules to rename it to eth2 and all networking scripts will subsequently fail.... => broken network => no "up" report to vagrant.

Fixed by deleting /etc/udev/rules.d/70-persistent-net.rules or removing the broken entries from there.

Because of the way udev persistent net rules work the interface will continue to receive the same name afterwards since it's new mac-address is now recorded into the rules file.

ramonvanalteren commented 12 years ago

I added a options hash to the vm.config.network param which will fixate the mac-address of the bridge adapter, this solves it for me...

garthk commented 12 years ago

I'm now seeing this on every attempt to vagrant up on one of my Macs. There's nothing in /etc/udev/rules.d/70-persistent-net.rules, my only interface is eth0, I have pre-up sleep 2 in /etc/network/interfaces, and adding sh /etc/init.d/networking restart before exit 0 in /etc/rc.local doesn't help. Any ideas?

UPDATE: destroying all other VMs and re-creating them fixed the problem.

ramonvanalteren commented 12 years ago

Boot with gui and login to the console and check if there is actually a network interface up ? Which one is it and which network setup are you using (hostonly, nat, bridged), what OS is running on this VM ?

jeanmonod commented 12 years ago

My experience on that issue is that it's clearly related to the Internet connection I used:

At home (wifi): Freeze on vagrant up
At office (wifi): Works great
At home (using iPhone as a proxy): Works great
And so on...

I'm not good enough in networking, to tell what the exact difference it is, but it's clearly an issue about what connection I use on my macbook...

ramonvanalteren commented 12 years ago

wild guess, could it be that you're wifi connection uses the same iprange as vagrant does by default ? aka 10.0.2.0/24 ?

jeanmonod commented 12 years ago

No I don't think so..., here is my ip: inet 192.168.0.11 netmask 0xffffff00 broadcast 192.168.0.255 I also detected (tested multiple times) the case NOT working on a second Office wiki. So it confirm that so place work and some other doesn't!

2012/4/12 Ramon van Alteren < reply@reply.github.com

wild guess, could it be that you're wifi connection uses the same iprange as vagrant does by default ? aka 10.0.2.0/24 ?

Reply to this email directly or view it on GitHub: https://github.com/mitchellh/vagrant/issues/391#issuecomment-5087830

David Jeanmonod david.jeanmonod@gmail.com 077 437 51 12

ramonvanalteren commented 12 years ago

I'm seeing these again on a intermittent basis. The problem in my case is that the primary nic (eth0) does not receive an ipaddress from the virtualbox build-in dhcp server

It is the virtualbox nat engine bug again :(

jeanmonod commented 12 years ago

After several vagrant up (about 50), it always give the same results for each network context. So now, I'm pretty sure that this is related to the current network config... But I really don't know what is the fail criteria...

So, like you said, I can drop this in that big box called 'vbox network bug' :(

2012/4/13 Ramon van Alteren < reply@reply.github.com

I'm seeing these again on a intermittent basis. The problem in my case is that the primary nic (eth0) does not receive an ipaddress from the virtualbox build-in dhcp server

It is the virtualbox nat engine bug again :(

Reply to this email directly or view it on GitHub: https://github.com/mitchellh/vagrant/issues/391#issuecomment-5110363

David Jeanmonod david.jeanmonod@gmail.com 077 437 51 12

danhively commented 12 years ago

I had the same networking issue and then I remembered that I'm the paranoid sort. I have a VPN automatically start on my OS X Lion Macbook Pro. After I disconnected the VPN all worked as it should! BTW I'm using veewee and VirtualBox 4.1.12.

xmartinez commented 12 years ago

I have been having this issue with a Linux guest (lucid32.box). The NAT interface sometimes does not get a DHCP assigned address during boot up. Running sudo dhclient in :gui mode allowed me to connect to the VM.

After some digging up, I have traced the problem to an incorrect setting of the VM hardware clock. Adding the following option to Vagrantfile seems to solve the issue:

config.vm.customize ["modifyvm", :id, "--rtcuseutc", "on"]

i.e, vagrant can always connect to the VM after boot up.

As this issue is closed, I have opened a new one to address the configuration problem (see #912).

uresu commented 12 years ago

This workaround is not working for me.

e42sh commented 12 years ago

I have had the same issues with lucid32, lucid64 and a self propelled ubuntu server instance. Each one failed with ssh connection.

After trying http://vagrantbox.es/170/ I didnt see the issue anymore. What is the difference between lucid* and tim huegdons base box?

hashicorp / vagrant

vagrant ssh only possible after restart network #391