hashicorp / vagrant

Vagrant is a tool for building and distributing development environments.
https://www.vagrantup.com
Other
26.23k stars 4.43k forks source link

Invalid state while waiting for it to boot on Ubuntu 22.04 #13222

Open smutel opened 1 year ago

smutel commented 1 year ago

Debug output

https://gist.github.com/smutel/6d93ef01e8f7fdf17aad026891c1a695

Expected behavior

The VMs are started correctly and no error are reported by vagrant.

Actual behavior

The VM is starting when we are doing vagrant up with option gui = on but the status is still Stopped in the VirtualBox interface.

Reproduction information

Vagrant version

Vagrant 2.3.7

Host operating system

Ubuntu 22.04 (jammy)

Guest operating system

debian/bullseye64

Virtualbox version

VirtualBox 7.0.8

Steps to reproduce

  1. vagrant up

Vagrantfile

IMAGE_NAME = "debian/bullseye64"
MASTERS = 1
NODES = 2

Vagrant.configure("2") do |config|
    config.ssh.insert_key = false

    config.vm.provider "virtualbox" do |v|
        v.memory = 2048
        v.cpus = 1
    end

    (1..MASTERS).each do |i|
        config.vm.define "master-#{i}" do |master|
            master.vm.box = IMAGE_NAME
            master.vm.network "private_network", ip: "192.168.56.#{i + 100}"
            master.vm.hostname = "k8s-master-#{i}"

            # naming the virtualmachine
            master.vm.provider :virtualbox do |vb|
                vb.name = "k8s-master-#{i}"
            end

            master.vm.provision "file", source: "~/.ssh/id_rsa.pub", destination: "/tmp/id_rsa.pub"

            # change ansible to ansible_local if you are running from windows,
            # so that vagrant will install ansible inside VM and run ansible playbooks
            # eg: master.vm.provision "ansible_local" do |ansible|
            master.vm.provision "ansible_local" do |ansible|
                ansible.compatibility_mode = "2.0"
                ansible.playbook = "node-config.yml"
            end
        end
    end

    (1..NODES).each do |i|
        config.vm.define "node-#{i}" do |node|
            node.vm.box = IMAGE_NAME
            node.vm.network "private_network", ip: "192.168.56.#{i + 110}"
            node.vm.hostname = "k8s-node-#{i}"

            # naming the virtualmachine
            node.vm.provider :virtualbox do |vb|
                vb.name = "k8s-node-#{i}"
            end

            node.vm.provision "file", source: "~/.ssh/id_rsa.pub", destination: "/tmp/id_rsa.pub"

            # change ansible to ansible_local if you are running from windows,
            # so that vagrant will install ansible inside VM and run ansible playbooks
            # eg: node.vm.provision "ansible_local" do |ansible|
            node.vm.provision "ansible_local" do |ansible|
                ansible.compatibility_mode = "2.0"
                ansible.playbook = "node-config.yml"
            end
        end
    end
end
smutel commented 1 year ago

In .vagrant/machines/master-1/virtualbox there is two files with UUID : id and index_uuid. The UUID of the VM started by vagrant seems to be in id file and the UUID of the VM created in VirtualBox seems to be in index_uuid file. So when vagrant use the UUID in index_uuid, he cannot get the status of the vm.

Anybody have more infos to give me to help me to find a workaround or to troubleshoot what's wrong ?

smutel commented 1 year ago

Does not work with : https://releases.hashicorp.com/vagrant/2.3.7/vagrant_2.3.7_linux_amd64.zip

Work correctly with : https://releases.hashicorp.com/vagrant/2.3.7/vagrant_2.3.7-1_amd64.deb

dragetd commented 1 year ago

I have this issue already since several months, 6.1, 6.2, 6.3.

With any machine I try to start. Basically making it impossible to use vagrant with VirtualBox 7.0 <.<

dragetd commented 1 year ago

There seems to be an issue, if you reconfigure the location where your VMs are stored. It causes VirtualBox to report the VM state as stopped, eventho it is running.

You can check when the VM is running and run VBoxManage showvminfo xxxx-xxxxx-xxx-xxxx --machinereadable | grep -i state

This could be one of the reasons, I am still investigating.

dragetd commented 1 year ago

Okay, even with a fresh 7.0.10 install and my VMs under the default ~/.VirtualBox/Machines location, it still breaks.

Somehow, vagrant manages to creates VMs in some kind of limbo state. They are created, exist even to 'vboxmanage list vms', but are 'VMState: Powered Off', despite running.

This was introduced since 7.0.

While the VM running while being reported as 'VMState: Powered Off' does clearly look like some VirtualBox error, it is still a puzzle to me how vagrant creates VMs that are broken list his.

VBox has a concept of registering and unregistering VMs. It can create / clone a VM but not register it. I am not sure how this code calls the binary: https://github.com/hashicorp/vagrant/blob/main/plugins/providers/virtualbox/action/import.rb#L22

Now, if we make sure to get a --register into there somehow, it might help and make the VM actually appear properly for the rest of the tools. See https://www.virtualbox.org/manual/ch08.html#idm14413

Can we get this into vagrant somehow?

Edit

No, this is not the issue. The VM would register normally if the VBoxManage commands are called like vagrant does, but somehow they seem to do something different. Maybe an Env var or something else?

kwilczynski commented 1 year ago

@dragetd, thank you for looking into this!

For me, using a proper binary package (a Debian package, in my case) solved the problem.

When Vagrant was installed from either a Zip archive or via Homebrew (which I suppose is also done via unpacking the same archive) before, it wouldn't work. Then, I tried what @smutel suggested without having any expectations, and it turned out it worked.

That said, I haven't looked into why that is.

dragetd commented 11 months ago

@kwilczynski I now see why this fixed it for you. Switching to an (outdated) distribution package solved it as well for me.

It is a vagrant AppImage issue! Somehow AppImage is able to run VBoxManage in a different way that break VirtualBox, not properly registering the VM. Running the command manually works. AppImage doe not use any namespace/container techniques, so I am really not sure who to blame here. How VirtualBox can be run so it breaks itself or how vagrant manages to do so. xD

I created a VirtualBox issue with some more info: https://www.virtualbox.org/ticket/21889

PerennaSec commented 11 months ago

I've been wrestling with this issue for the better part of the year. I always assumed I had a misconfiguration or a missing kernel module somewhere. Using the package provided in the Gentoo repo solved this error for me, even though it's a downgraded version compared to what's offered on the Hashicorp site. I've now run into an unrelated error that may present another wall for me (https://github.com/hashicorp/vagrant/issues/12807), however the error referenced above was resolved by forgoing Hashicorp's AppImage.