hashicorp / vagrant

Vagrant is a tool for building and distributing development environments.
https://www.vagrantup.com
Other
26.28k stars 4.44k forks source link

vagrant ssh only possible after restart network #391

Closed sebastian-alfers closed 12 years ago

sebastian-alfers commented 13 years ago

Hey,

i can not log into my VM after "vagrant up"

i have to start it in gui-mode, then retart my network adapter "sudo /etc/init.d/networking restart" after this, my VM gets an ip (v4) address and my mac is able to ssh the VM and do the provisioning.

any idea on this?

same isse as here: http://groups.google.com/group/vagrant-up/browse_frm/thread/e951417f59e74b9c

the box is about 5 days old!

Thank you! Seb

johnste commented 12 years ago

I'm having the same problem with lucid32+64. GUI workaround with network restart works.

rhodee commented 12 years ago

vagrant halt returns an error and vagrant up hangs. I've built a box from scratch and can verify it works. However when I try to create an new instance of the box I have issues with the above commands.

On vagrant up my console spits out the following: [default] Clearing any previously set forwarded ports... [default] Forwarding ports... [default] -- 22 => 2222 (adapter 1) [default] Creating shared folders metadata... [default] Clearing any previously set network interfaces... [default] Booting VM... [default] Waiting for VM to boot. This can take a few minutes.

When I CTRL-C out of vagrant up and then do vagrant ssh I can enter my box. Even though the command is hanging, I can see that the VM is running from VirtualBox.

When I exit the guest and run vagrant halt, I get:

The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!

shutdown -h now

The box can be halted running vagrant suspend then vagrant halt-weird.

Running VirtualBox 4.1.14r7740 and vagrant 1.0.2

Thanks for any assistance you can provide.

leth commented 12 years ago

the veewee tool has a handy set of tests to check you've set everything up right. It's under the valdate command.

rhodee commented 12 years ago

@leth I can confirm that I am experiencing a similar result from building it from scratch (previous post). While using veewee 0.3.alpha9 to build a VM from ubuntu-12.04-amd64 template, I can't ssh into the box.

I waited less than 5m for the VM to boot.

[default] Failed to connect to VM!
Failed to connect to VM via SSH. Please verify the VM successfully booted
by looking at the VirtualBox GUI.

It is running in VirtualBox.

rchampourlier commented 12 years ago

I've been using this specific configuration in my Vagrantfile for some time and it works perfectly on my Macbook under OS X Lion 10.7.3, VirtualBox 4.1.14r77440 (from VBoxManage -v) while it was not starting up correctly more than 2 times out of 3.

First be sure there is no conflict between your box's network and any other active virtual machine. I use hostonly networks and ensure different networks are used for each machine I configure:

config.vm.network :hostonly, "10.10.10.2"

This is the trick found above in this thread:

config.vm.customize ["modifyvm", :id, "--rtcuseutc", "on"]

Should it still does not work, I want it to notify me faster, so I reduce the number of tries for SSH:

config.ssh.max_tries = 10

Hope it will help!

rhodee commented 12 years ago

@rchampourlier thanks for the tip. I added those to my Vagrantfile and still no luck, I updated my post above with the output. I am going to review issue #14

flevour commented 12 years ago

Hi everybody, as the issue is over one year old, could anybody define/suggest a common test case that can be run in order to pin point the issue? I am experiencing similar problems and I would like to provide help in debugging and/or trying different configurations.

aelmadho commented 12 years ago

I ran into this issue recently, with Fedora 13 only, where Fedora 16 does not show this issue. It is network related, since when I log in using the GUI, eth0 is not active.

I have disabled NetworkManager and set NM_CONTROLLED="no" in my ifcfg-eth0 file. This was a defect according to Redhat https://bugzilla.redhat.com/show_bug.cgi?id=597515, which is no longer maintained.

So I can agree, this goes back to an issue with bringing up the interfaces, it has nothing to do with SSH, or maybe other edge cases are present...

What distribution are you running? and what does "dmesg" show if you login using the GUI? and if you login using the gui and do /etc/init.d/network restart, what happens?

ghost commented 12 years ago

Hi, I did what @mikhailov suggests in #issuecomment-2078383 (restart the networking service in rc.local) and it worked for me

flevour commented 12 years ago

So, what I found is: I am using https://github.com/erivello/sf2-vagrant and the lucid32 distributed by Vagrant. I am trying the same exact configuration in 2 identical iMac's at my company: same hardware, same OS X version 10.6.8, same virtualbox version (4.1.16), same vagrant version (1.0.3). In one of the iMac's the machine boots up just fine, in the other one it lags at the SSH connection.

This makes me think it's something different in the host environment or in the interaction between host and vm.

I also tried to do a complete reinstall virtualbox and deleting the ~/.vagrant.d folder to start fresh, but I still get the error.

EDIT: I retried after a few days and now it's working: probably a host reboot fixed the problem? Or this is something random.

schmurfy commented 12 years ago

just got this one twice already today using the vagrant lucid32 basebox. I also got it on the first boot with this vm just after the first "vagrant up" with virtualbox 4.1.10

zejn commented 12 years ago

I've used this crontab entry as a workaround:

@reboot  /sbin/ifdown eth0 ; /bin/sleep 5 ; /sbin/ifup eth0  
hedgehog commented 12 years ago

@schmurfy, thanks for your work on Goliath, hope the following helps bring you up to speed on the state of play...

Somewhere in this, or related issue threads, is my finding that this is caused by ssh trying to completed its handshake while the OS is not ready for this, e.g. cycles spent in motd. The ssh connection is made, just not completed.

There are several ways to mitigate this, no motd, restart network services, bring the iface down then up etc.
There is no solution just workarounds, and a Google project (which I can't recall right now - search my nickname+vagrant in their issues) is having this same issue, also in a VM booting context.

Bootstrapping via VB level commands were investigated by Mitchell and weren't feasible due to VB issues. Bootstrapping over serial console likewise was suggested but not completed for good reasons that escape my memory right now.

HTH

schmurfy commented 12 years ago

@hedgehog with the removal of the router part I am not sure a lot of my code remains in goliath xD

Given that some solutions exists as proposed here it would be nice if the base image came with it, I think I will try creating my own base image with one of the proposed fix, thanks.

ku1ik commented 12 years ago

+1 for creating base images with any working workaround.

dramer-817 commented 12 years ago

My box CentOS 6.2 32bit, stalled when vagrant up at "Waiting for VM to boot. This can take a few minutes.". SSH to box work. This happen when host not connected to wifi/internet. So as workaround i disabled the firewall at guest box and it work, also check host firewall.

arthurk commented 12 years ago

I recently ran into the same problem. I'm using vagrant 1.0.3, VirtualBox 4.1.18 and the standard lucid32 box. This workaround from @xmartinez worked for me:

config.vm.customize ["modifyvm", :id, "--rtcuseutc", "on"]
ghost commented 12 years ago

It is easier to debug when you can boot the machine with the GUI. I've experienced this issue on a local Mac machine where it is easy to boot the GUI but I also experienced it on a remote Debian server where I had to install X11 and then do X11 forwarding to get to boot the GUI on the local Mac and then debug it and turn the config back to no GUI.

gfreeau commented 12 years ago

If you are using the latest ubuntu or variants like mint, there has been changes to how ubuntu handles dns. Try running

sudo dpkg-reconfigure resolvconf

And using the first option to create a symlink to /etc/resolv.conf. Virtualbox needs this file to set the DNS correctly for NAT. This should be made more obvious.

Doing this fixed the problems for me, I didn't have to set ssh timeouts, restart networking or use --rtcuseutc

dstrctrng commented 12 years ago

In headless mode with eth1 as host-only networking, I also get a hanging vagrant waiting for the ssh port forward to connect. I can ssh to eth1 fine so I think this is a problem with port forwarding or NAT eth0. Hard to test because I can't ssh to eth0 directly from OSX.

To fix, a simple "ifdown eth0; ifup eth0". Suspect it's some timing around eth0, vboxservice loading, port mapped.

dstrctrng commented 12 years ago

The ifdown eth0 has this error from dhcp:

DHCPRELEASE on eth0 to 10.0.2.2 port 67 send_packet: Network is unreachable send_packet: please consult README file regarding broadcast address.

After an ifup, further ifdown are successful.

dstrctrng commented 12 years ago

Don't even need the ifup/ifdown, a "dhclient eth0" will let vagrant resume.

dstrctrng commented 12 years ago

I've been reloading my vagrant over and over for an hour, each reload takes 90 seconds. No hangs.

I don't use the "pre-up sleep 2" or any of the workarounds in this thread.

In rc.local on Ubuntu, right before the exit 0, I put "dhclient eth0". This won't disturb the network, it'll just kick eth0 in the butt and get it working again. Since it runs last, I hope it avoids whatever it is that is hanging the ifup during network init, because that's what I saw for both eth0 NAT and eth1 host-only interfaces on my guests -- ifup still running, their child processes blocked.

pionono commented 12 years ago

I try to restart networking service on boot but, for some reasons, I can't access to webserver. So I've to restart to times networking service and it works. But I can't stop the vm with "vagrant halt" (I've before to run "vagrant suspend") and I can't access to ssh with "vagrant ssh" (I've to use "ssh vagrant@IP).

bigwhoop commented 12 years ago

starting the VMs in GUI mode and then executing "sudo dhclient eth0" resumed vagrant for me, too.

mitchellh commented 12 years ago

@destructuring Awesome! I'm going to put this into the base boxes I release and hope this solves this issue. :)

mitchellh commented 12 years ago

I just uploaded new lucid32/lucid64/precise32/precise64 with @destructuring's changes. Let me know if this is gone!

axsuul commented 12 years ago

None of these solutions are working for me. The only thing that does is to rebuild the box. I noticed only one other person has commented on @destructuring's solution working for them. Can I get a sanity check?

axsuul commented 12 years ago

I get

RTNETLINK answers: File exists

when running sudo dhclient eth0.

maxverbosity commented 12 years ago

I'm not sure this is the proper forum but I've been running into similar problems with hangs on 'vagrant up'

I'm posting here because I'm seeing postings that indicate different behavior in different environments, which is what I ran into and there seem to be multiple tickets tied to the same core issue. This seemed as good a spot as any :) The solution seems to be outside vagrant.

If you are behind a proxy (likely to be the case at work but not at home) you will need to configure the guest system with your proxy settings. Setting http_proxy and https_proxy environment variables in /etc/bashrc worked for me, made them system wide and available for the ssh access required by vagrant. If you do not specify the proxy you will receive the dreaded ifup message and your boot will hang.

The caveat here is that if you set this and try to boot while you are not behind the configured proxy you will receive the same message and hang on boot.

janschumann commented 12 years ago

For me this issue is not closed. I found a workflow to reproduce it. Please read (and edit/comment) https://github.com/mitchellh/vagrant/wiki/%60vagrant-up%60-hangs-at-%22Waiting-for-VM-to-boot.-This-can-take-a-few-minutes%22

JustinAzoff commented 12 years ago

I'm also having the hang at "[default] Waiting for VM to boot. This can take a few minutes." but I've somewhat figured out the cause. The DNS proxying is not working causing ssh connections to take 10 seconds to be established. This cause the probe to timeout. vagrant ssh and other commands seem to have a longer timeout and they run OK.

Some base boxes also boot OK because they do not have UseDNS yes in /etc/ssh/sshd_config and don't run into this problem at all.

For me restarting networking does not work.. it seems the dns proxy stuff just doesn't work on the version of vagrant in ubuntu 12.10 (1.0.3) with virtualbox 4.1.18

JustinAzoff commented 12 years ago

ah, somewhat figured it out:

my resolv.conf has

nameserver 127.0.1.1

The code in vagrant only checks for 127.0.0.1 when disabling the DNS proxy. That said, I fixed the regex but dns still doesn't work in the VM. It'll work fine if I change the DNS server to 192.168.1.1 or 8.8.8.8, so it's not completely broken, something is just breaking the autoconfiguration.

axsuul commented 12 years ago

I've been having success with /etc/init.d/networking restart in /etc/rc.local.

JustinAzoff commented 12 years ago

I'm not sure why restarting networking works for some people, it doesn't work here.

It looks like HEAD already has the fix for 127.0.1.1 issue, so that's good..

as for the other issue, looking here: https://bugs.launchpad.net/ubuntu/+source/virtualbox/+bug/1031217 the fix for the issue is stated to be to turn natdnshostresolver1 on, but the code in vagrant that is linked from that bug turns it off. I'm not sure why there is a discrepancy, but this probably has something to do with my problem.

ghost commented 11 years ago

I have just retried with a freshly downloaded official lucid32 and on a remote debian and it works fine without doing anything special.

houtmanj commented 11 years ago

For me this issue was in the dns configuration.. setting: VBoxManage modifyvm "puppet-playground_1357644642" --natdnshostresolver1 on

fixed this for me.

smoya commented 11 years ago

Sometimes GRUB starts in failsafe mode (when box downs in Ubuntu by example) and sets a grub timeout of -1.

Fix:

dkinzer commented 11 years ago

config.vm.customize ["modifyvm", :id, "--rtcuseutc", "on"]

works for me, but only if I included it before configuring the vm network: config.vm.network :hostonly, "10.10.10.2"

StoneCypher commented 11 years ago

So, I've found a completely unrelated cause of these circumstances. I doubt many people are having the stupid problem I was, but, for the record, if your brand new laptop has VX turned off, then when the interior VM can't start, the way it manifests from the outside is the SSH server being unwilling to accept connections (because it's not running.)

And so you end up with the repeated attempts to hit 2222, all failing.

And you really can't tell the difference, from the outside, against any of these other causes.

The way to test if you've got my problem is just to run the VM directly from the VB manager. If you get a message talking about how you can't boot without VT-X/AMD-V, then, well, ha ha.

Older machines, go into the BIOS and turn it on.

Newer machines, UEFI gets in your way. From Win8, go to the start screen, and type bios. It'll say that no apps match your search, but if you look, one setting does. Hit settings - you'll see "advanced startup options." Go in there, and under the general tab, go to the bottom, where there's a button "restart now" under the heading "advanced startup."

When you hit that, it doesn't actually restart now; it brings up another menu, one item of which allows you to get at your bios. Follow that, and you'll get in.

Then go turn on whatever your BIOS calls your hypervisor hardware. (There's like six names for it, but it's usually VT-X or AMD-V.) Enable, save, and shut down.

On reboot, vagrant will be happy again.

keo commented 11 years ago

adding

ifdown eth0
sleep 1
ifup eth0
exit 0

to /etc/rc.local solved it. dhclient eth0 solves it too.

A weird thing is that when I build by base box image, doing apt-get install dkms before installing VirtualBox additions made it work 100% afterwards.

jphalip commented 11 years ago

I've run into the same frustrating issue while building a CentOS base box. What completely fixed it for me was to add dhclient eth0 to /etc/rc.local as suggested by @keo above. I wonder if this is something that Vagrant itself could help with, by systematically kicking eth0 on startup...

romanzenka commented 11 years ago

I have the same issue with CentOS 6.3.

My suspicion is that the 10.0.2.2 gateway actually EXISTS on our network:

10.0.2.0 * 255.255.255.0 U 0 0 0 eth0 link-local * 255.255.0.0 U 1002 0 0 eth0 default 10.0.2.2 0.0.0.0 UG 0 0 0 eth0

So if my networking is going through some poor random server, no wonder it takes forever for the packets to go through.

I will try to figure out how to set up the networking differently.

Edit: I resolved my issue. I needed to reconfigure the network VirtualBox uses for DHCP.

http://stackoverflow.com/questions/15512073/set-up-dhcp-server-ip-for-vagrant

Added following code:

  config.vm.provider :virtualbox do |vb|
    vb.customize ["modifyvm", :id, "--natnet1", "192.168/16"]
  end

You can check for this issue easily - even before you start Vagrant, ping 10.0.2.2 - if you get a response, you are in trouble.

jamshid commented 11 years ago

To anybody else trying to make their way through this, if you're trying one of the suggested workarounds:

config.vm.customize ["modifyvm", :id, "--rtcuseutc", "on"]

this does not work if your Vagrantfile is version 2 and starts with:

Vagrant.configure("2") do |config|

You'll get errors like:

$ vagrant destroy
/Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/plugins/kernel_v2/config/vm.rb:147:in `provider': wrong number of arguments (0 for 1) (ArgumentError)
    from /Users/jamshid/tmp/70/Vagrantfile:13:in `block in <top (required)>'
from /Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/lib/vagrant/config/v2/loader.rb:37:in `call'
    ...
from /Applications/Vagrant/bin/../embedded/gems/bin/vagrant:23:in `<main>'

Instead use:

config.vm.provider :virtualbox do |vb|
    vb.customize ["modifyvm", :id, "--rtcuseutc", "on"]
end
mitchellh commented 11 years ago

Does anyone have this issue anymore without a CentOS machine? CentOS the issue is most likely forgetting to remove the udev rules. But anyone getting this with the precise64 box or some Ubuntu based box or some box where they KNOW they cleared the udev rules?

janschumann commented 11 years ago

Yes. Please see https://github.com/mitchellh/vagrant/wiki/%60vagrant-up%60-hangs-at-%22Waiting-for-VM-to-boot.-This-can-take-a-few-minutes%22

mconigliaro commented 11 years ago

I found that after running yum groupinstall Desktop and rebooting a CentOS 6.4 guest, the VM could not communicate with the outside world at all. The fix for me was to disable NetworkManager and restart networking.

chkconfig NetworkManager off
service network restart
haydenk commented 11 years ago

For me, it ended up being that I had to make sure "Cable connected" was checked under the adapter settings in VirtualBox.

screen shot 2013-04-29 at 10 32 58 pm

acesuares commented 11 years ago

For a problem with the same symptoms, but a different cause and solution, see https://github.com/mitchellh/vagrant/issues/1792

jackdempsey commented 11 years ago

Just a public +1 and thanks to @jphalip whose tip fixed things up for me on a centos vm