Sometimes hangs on "Waiting for VM to boot. This can take a few minutes."

jonleighton commented 13 years ago

HI there,

Sometimes (I mean, fairly often, maybe 30-50% of the time for me) vagrant seems to hang on:

Waiting for VM to boot. This can take a few minutes.

I mean possibly it would finish eventually, but I have never waited to potentially infinite length of time to see. It certainly seems to take longer than 'usual', when I do manage to successfully boot.

When this happens, the only thing I can do is poweroff the VM through VBoxManage and try again.

Is there any way I can get more output about what it's doing in order to help debug this?

Cheers

wsc commented 13 years ago

This has happened to me for a while now and I'm not sure what's up.

pattern commented 13 years ago

I have noticed the same. It feels very "non-deterministic" in that there is an random chance the VM will just keep thrashing at 100% CPU. I found that when I didn't specify a config.vm.network in the Vagrantfile, there is a much lower chance of the VM entering this state. This makes me think it has something to do with the networking/dhcp configuration. For what it's worth, I also have config.ssh.max_tries = 100.

Were either of you using config.vm.network to specify a specific IP? If so, try commenting it out and see if that works.

tomusher commented 13 years ago

Same problem here; seems to have started when I updated my lucid32.box to fix #445.

I'm not setting config.vm.network but it does seem network configuration related - to work around it I'm using config.vm.boot_mode = :gui and when it gets stuck, manually logging in to the machine and running sudo /etc/init.d/networking restart.

ThePixelDeveloper commented 13 years ago

This happens to me too:

[mathew@thepixeldeveloper]$ vagrant up
[default] VM already created. Booting if its not already running...
[default] Clearing any previously set forwarded ports...
[default] Forwarding ports...
[default] -- ssh: 22 => 2222 (adapter 1)
[default] Cleaning previously set shared folders...
[default] Creating shared folders metadata...
[default] Running any VM customizations...
[default] Booting VM...
[default] Waiting for VM to boot. This can take a few minutes.
[default] Failed to connect to VM!
Failed to connect to VM via SSH. Please verify the VM successfully booted
by looking at the VirtualBox GUI.

I can see the machine has booted with the GUI.

I also tried using the vagrant ssh command when the vagrant up command failed. SSH fails with the following error message:

[mathew@thepixeldeveloper]$ vagrant ssh
ssh_exchange_identification: Connection closed by remote host

Rebooting sudo reboot from the GUI fixes this for me.

Gems

vagrant (0.8.6, 0.8.5)
virtualbox (0.9.2, 0.9.1, 0.9.0)

Virtualbox

Virtualbox 4.1.2r73507

AlexMikhalev commented 13 years ago

bump. I just posted question in a group with a same issue. My vagrant up works only once after reboot/ reinstall Virtual Box.

ThePixelDeveloper commented 13 years ago

Quick question, are you guys running boxes built with VeeWee?

jonleighton commented 13 years ago

I am not

AlexMikhalev commented 13 years ago

No, clean vagrant. But I think I found a solution - check your networking settings for Virtual Box. (on Mac command +, then Networking, host only networks. I deleted a host only network which happened to be there and now I can restart my VMs without restarting Mac. Folks if you can confirm verify it, that would be excellent.

ThePixelDeveloper commented 13 years ago

I only have one networked device listed (NAT).

@AlexMikhalev You have two networked devices because at some point you enabled the

# Assign this VM to a host only network IP, allowing you to access it
# via the IP.
config.vm.network "33.33.33.10"

option which meant Vagrant created the Host-Only interface.

However, still having just the NAT interface was no improvement for me.

AlexMikhalev commented 13 years ago

@ThePixelDeveloper I didn't mean second adapter on VM, but common settings for host only networks in Virtual Box preferences - basically I removed vboxnet0 completely from my host. But it didn't help in truth.

ThePixelDeveloper commented 13 years ago

I suspect this is a VirtualBox bug. The networking interface is failing to get an IP address from the DHCP server for whatever reason. Which releases of Virtualbox are we running? I can rule out Virtualbox 4.1.2r73507 already, I'll go backwards until it's "fixed"

AlexMikhalev commented 13 years ago

I think it may be related to issue described here: http://blog.techprognosis.com/2011/02/28/how-to-enable-dhcp-in-virtualbox-4.html I have a feeling that DHCP server for NAT addresses broken, but I wasn't able to influence it with commands like: VBoxManage dhcpserver add –netname vboxnet0 –ip 10.0.3.100 –netmask 255.255.255.0 –lowerip 10.0.3.101 –upperip 10.0.3.254 –enable I know it should be for internal network, but I feel that dhcp server for NAT doesn't work or issue incorrect IP addresses.

ThePixelDeveloper commented 13 years ago

I don't think it's broken because if it was then you wouldn't be able to get an IP address running sudo dhclient. Lets see ...

Is this isolated to Vagrant boxes?
Does anyone have problems on the same operating system, but not built with Vagrant/VeeWee?

I have another Ubuntu server I just booted and don't have such problems (it doesn't have the VBadditions). I installed the VBadditions and still no problems there. Very very strange.

AlexMikhalev commented 13 years ago

I do not use VeeWee. In my VBox logs difference between successful boot and failed in these lines: 00:00:26.584 NAT: IPv6 not supported 00:00:26.622 NAT: DHCP offered IP address 10.0.2.15 00:00:26.623 NAT: DHCP offered IP address 10.0.2.15
while hang up start finishes at: 00:00:24.642 NAT: IPv6 not supported

I use lucid32 and lucid64 base boxes and both have same issue. This issue is not related to vagrant specifically in my case as I have a same problem trying to start vagrant generated boxes from virtual box GUI - sometimes I get ip (10.0.2.15), sometimes I don't - so I need to run sudo dhclient and get same ip from DHPC server 10.0.2.15. If I start two VM's - one with lucid32 and other with lucid64, they both have same IP - 10.0.2.15 after I will run `sudo dhclient'

AlexMikhalev commented 13 years ago

update: I downloaded box from http://opscode-vagrant-boxes.s3.amazonaws.com/ubuntu10.04-gems.box - same behaviour, I can start it first time with vagrant up successfully, shut it down, then attempt to start again with vagrant up hangs forever.

ThePixelDeveloper commented 13 years ago

This issue is not related to vagrant specifically in my case as I have a same problem trying to start vagrant generated boxes from virtual box GUI

I mean you should try and install and run the operating system without using a vagrant base box.

I have another that works fine, you should try it too, then we can confirm if it's to do with Vagrant or not.

Look at this for the explanation of the 10.0.2.15 IP Address

Edit. I'm out of ideas on this one. I've built a system using box using VeeWee which works as expected, then seemingly fails once it's been compiled into a box and imported into Vagrant. I have no idea what Vagrant does to the image when it's been packaged, maybe something to look into.

ThePixelDeveloper commented 13 years ago

I fixed this for me, or at least I think I did. Start the troubled machine in gui mode, login and execute the following commands as root:

rm /var/lib/dhcp3/* - Removes any existing DHCP leases

Disable automatic udev rules for network interfaces in Ubuntu

rm /etc/udev/rules.d/70-persistent-net.rules
mkdir /etc/udev/rules.d/70-persistent-net.rules

The machine now starts up and has the correct IP address.

Perhaps this has something to do with the different network adapter MAC addresses. The base box would have been built on a VirtualBox instance where the MAC is different to the one that your using now, just a thought.

AlexMikhalev commented 13 years ago

ThePixelDeveloper - tried you solution, doesn't work for me on lucid32.

rozza commented 13 years ago

setting gui on and then manually logging in and restarting networking fixed it for me..

niko commented 13 years ago

Had the same issue. I could work around by booting in gui mode, logging in and manually doing

ThePixelDeveloper commented 13 years ago

Any progress on this? It's definitely Vagrant causing trouble here, from my experiments every other machine I've built with VirtualBox (with the same configuration) doesn't show this problem.

To be more clear, something happens when Vagrant builds the box and not when Vagrant launches the box. So booting the box without the help of Vagrant still displays the problem. If someone can point me towards the code where Vagrant does its building I can take a look.

jonleighton commented 13 years ago

What version of VirtualBox are you all using?

I haven't experienced the problem recently, and I think VirtualBox may have been upgraded on my system at some point after I filed this bug (I'm on Fedora so I have package management...)

My current VirtualBox version is 4.1.2 r73507. Anyone on the same or later and still experiencing this?

rozza commented 13 years ago

Its happening to me on: VirtualBox version is 4.1.2 r73507

niko commented 13 years ago

I had the issue with the lucid32 box (http://www.vagrantbox.es/1/). Using the ubuntu 11.04 box (http://www.vagrantbox.es/26/) doesn't show the issue.

ku1ik commented 13 years ago

Same issue here. (Ubuntu 11.04, VirtualBox 4.1.2, vagrant 0.8.6).

I wanted to try ubuntu 11.04 box (http://www.vagrantbox.es/26/) but after downloading I got:

[vagrant] Extracting box...
[vagrant] Verifying box...
[vagrant] Cleaning up downloaded box...
The box file you're attempting to add is invalid. This can be
commonly attributed to typos in the path given to the box add
command. Another common case of this is invalid packaging of the
box itself.

AlexMikhalev commented 13 years ago

I had a repeatable same issue with Mac OS X Snow Leopard and Ubuntu 10.04 LTS as a virtual box hosts. I repeat it with various boxes - building box using VeeWee or downloading ready ones.

flashingpumpkin commented 13 years ago

Same issue here. After starting in gui mode, logging in and doing sudo /etc/init.d/networking restart it'll work from command line again.

This issue is very annoying as it happens on every new box after installing the first one.

frozenskys commented 13 years ago

I can confirm this is happening on my OS X Lion box as well, problem is with both Lucid64 and Natty64 boxes. I have tried both VirtualBox from 4.1.0 to 4.1.2 and the problem occurs on virtually every vagrant up command. vagrant is now unusable due to this issue :(

ThePixelDeveloper commented 13 years ago

Can we confirm it only happens with Vagrant and NOT with a VirtualBox Machine with the same specifications (disk, network, etc ...).

mikhailov commented 13 years ago

there is a temporary solution until Virtualbox DHCP dhclient fixed:

1) run virtual machine with :gui

2) sudo vi /etc/rc.local

' #/bin/sh -e
' sh /etc/init.d/networking restart
' exit 0

3) sudo halt

ThePixelDeveloper commented 13 years ago

Will try this mikhailov, thanks.

mikhailov commented 13 years ago

probably this better:

sudo vi /etc/network/interfaces
pre-up sleep 10

ThePixelDeveloper commented 13 years ago

@mikhailov That doesn't work.

This line is actually included in the VeeWee build scripts: https://github.com/jedi4ever/veewee/blob/master/templates/ubuntu-10.04.3-server-amd64/postinstall.sh#L88

I've used a bigger value and it didn't seem to make a difference.

frozenskys commented 13 years ago

Wanting to build my own base box while I was having issues with 'vagrant up' I updated to the latest VirtualBox, installed VeeWee and built a new Ubuntu 11.04 box. Since then I haven't had this problem (even with the old boxes).

I did do a gem update after the install of VeeWee - and I did notice that net-ssh was updated as part of this, I'm not sure if it could be related?

mikhailov commented 13 years ago

@ThePixelDeveloper yes, that seems doesn't work. So I should login with :gui for first time and update /etc/rc.local every time I run a new instance until it fixed

Gonzih commented 13 years ago

Any progress? Same issue.

Arch Linux 32 Guest Additions Version: 4.1.0 VirtualBox Version: 4.1.2_OSE Vagrant version 0.8.7 Ruby 1.9.2 lucid32 box

leth commented 13 years ago

I think I've found the problem but I have no idea how to build a new box, so can't test it.

The DHCP client will wait 60 seconds for replies to its request. If there was no response and there are no old leases to fall back to it will then wait five minutes before retrying.

Hopefully adding a shorter timeouts like timeout 2 and retry 2 into /etc/dhcp3/dhclient.conf will fix the problems.

Gonzih commented 13 years ago

@leth nope, still same issue for me with that options in dhclient.conf. Use vagrant package for create new box.

dawngerpony commented 13 years ago

The sudo dhclient approach works well for me as a temporary fix, I'd love to get this fixed permanently though because this will be the setup process for potentially hundreds of developers at my company.

More information exists at Stack Overflow.

ku1ik commented 13 years ago

Just checked on Fedora 15, VirtualBox 4.1.4 (latest), vagrant 0.8.7 - the issue still exists.

ThePixelDeveloper commented 13 years ago

I guess it's time to pull out git bisect and start the arduous journey.

On 4 October 2011 18:48, Marcin Kulik < reply@reply.github.com>wrote:

Just checked on Fedora 15, VirtualBox 4.1.4 (latest), vagrant 0.8.7 - the issue still exists.

Reply to this email directly or view it on GitHub: https://github.com/mitchellh/vagrant/issues/455#issuecomment-2289222

ku1ik commented 13 years ago

+1 for git bisect

leth commented 13 years ago

I think I've pinned it down to /etc/udev/rules.d/70-persistent-net.rules We obviously need to keep it empty, but making it into a directory seems to break things.

I tried making it a non-writable file, but that still broke things.

To fix:

 sudo rmdir /etc/udev/rules.d/70-persistent-net.rules
 sudo touch /etc/udev/rules.d/70-persistent-net.rules

EDIT:

After 3 successful tries, I tried it a fourth, and it's still broken. >_<

leth commented 13 years ago

It looks like it might be a virtualbox issue: https://www.virtualbox.org/ticket/4038

karel1980 commented 13 years ago

I was having the same problem:

vagrant upkarel@rolmops:~/vagrant/c57$ vagrant up [default] Importing base box 'centos-57'... [default] Preparing host only network... [default] Matching MAC address for NAT networking... [default] Clearing any previously set forwarded ports... [default] Forwarding ports... [default] -- ssh: 22 => 2222 (adapter 1) [default] Creating shared folders metadata... [default] Running any VM customizations... [default] Booting VM... [default] Waiting for VM to boot. This can take a few minutes. [default] Failed to connect to VM! Failed to connect to VM via SSH. Please verify the VM successfully booted by looking at the VirtualBox GUI.

My host is ubuntu 11.04 + virtualbox 4.1.4 (vagrant gem 0.8.7). The guest is centos 5.7 + virtualbox 4.1.4. The Vagrantfile has

config.vm.network="33.33.33.10"

If I add config.ssh.max_tries = 150 everything works

But a lot of time gets lost (waiting for a DHCP lease which can't be obtained on that interface - it needs to time out) I could add some configuration to the box which avoids sending DHCP requests - e.g. adding 'dummy' eth1-9 config files disabling those interfaces on the first boot.

millisami commented 13 years ago

Its the same for me. This bug might gonna take a long time to resolve. Everytime rebuilding is eating up my time and my slow connection to dwnld boxes by boxes. Makes tired rebuilding everytime.

But for temporary fix, launching in gui mode and sudo dhclient and vg ssh will work.

enr commented 13 years ago

I got the same problem, but apparently only with boxes built using Veewee. I'm using Ubuntu as host, VirtualBox 4.1.4; for Vagrant and Veewee I've tried a lot of combinations. I don't know the internal of VirtualBox or Vagrant, so I don't know if it's important but I see in ~/.VirtualBox/VBoxSVC.log some errors detected:

ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={c28be65f-1a8f-43b4-81f1-eb60cb516e66} aComponent={VirtualBox} aText={Could not find a registered machine named 'oct15'}, preserve=false
ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={c28be65f-1a8f-43b4-81f1-eb60cb516e66} aComponent={VirtualBox} aText={Could not find an open hard disk with location '/home/enrico/VirtualBox VMs/oct15/box-disk1.vmdk'}, preserve=false
ERROR [COM]: aRC=VBOX_E_OBJECT_NOT_FOUND (0x80bb0001) aIID={c28be65f-1a8f-43b4-81f1-eb60cb516e66} aComponent={VirtualBox} aText={Could not find a registered machine named 'oct15_1318713570'}, preserve=false

in ~/.VirtualBox/VirtualBox.xml:

<MachineEntry uuid="{3cfe8af8-96da-41d3-ac3e-7266d3bb8f49}" src="/home/enrico/VirtualBox VMs/oct15_1318713570/oct15_1318713570.vbox"/>

doing a 'ps aux | grep virtual' I see the actual command:

/usr/lib/virtualbox/VBoxHeadless --comment oct15_1318713570 --startvm 3cfe8af8-96da-41d3-ac3e-7266d3bb8f49 --vrde config

Is VirtualBox looking for a box registered with a different uiid, or the aIID in the log is apart from the uiid in the configuration file and in command line?

mikebannister commented 13 years ago

This one is killing me... having the same problem here but none of the workarounds (dhclient, reboot, restart networking) help me. Is there a combo of older vbox/vagrant/base box that can get me back to work? Peace, Mike

mikebannister commented 13 years ago

Ah OK, things seem fixed here after much flailing about. Fix appears to be to use vagrant HEAD. Maybe I had a different problem with the same symptoms¿ Hope this is helpful and not just a bunch of noise. -Mike

leth commented 13 years ago

I can't see any commits since the last release which would fix it. Probably just random luck I suspect.

hashicorp / vagrant

Sometimes hangs on "Waiting for VM to boot. This can take a few minutes." #455