vagrant: failed to deploy a vm from `vagrant up`

oomichi commented 2 years ago

During vagrant up the following messages are output and a vm never is created: /var/log/libvirt/qemu/kubesprayk8s-1.log

2022-04-05 17:08:27.791+0000: starting up libvirt version: 6.0.0, package: 0ubuntu8.15 (Christian Ehrhardt <christian.ehrhardt@canonical.com> Thu, 18 Nov 2021 10:23:11 +0100), qemu version: 4.2.1Debian 1:4.2-3ubuntu6.21, kernel: 5.4.0-107-generic, hostname: dev01
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \
HOME=/var/lib/libvirt/qemu/domain-2-kubesprayk8s-1 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-2-kubesprayk8s-1/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-2-kubesprayk8s-1/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-2-kubesprayk8s-1/.config \
QEMU_AUDIO_DRV=none \
/usr/bin/qemu-system-x86_64 \
-name guest=kubesprayk8s-1,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-kubesprayk8s-1/master-key.aes \
-machine pc-i440fx-focal,accel=kvm,usb=off,dump-guest-core=off \
-cpu Skylake-Client-IBRS,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,hle=off,rtm=off \
-m 2048 \
-overcommit mem-lock=off \
-smp 1,sockets=1,cores=1,threads=1 \
-uuid 574bbd4a-48c4-4bcb-8b95-06c44d88dd94 \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=31,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc \
-no-shutdown \
-boot strict=on \
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/generic-VAGRANTSLASH-ubuntu1804_vagrant_box_image_3.6.12_box.img","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-2-format","read-only":true,"driver":"qcow2","file":"libvirt-2-storage","backing":null}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/kubesprayk8s-1.img","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=libvirt-1-format,id=ua-box-volume-0,bootindex=1 \
-netdev tap,fd=33,id=hostua-net-0,vhost=on,vhostfd=34 \
-device virtio-net-pci,netdev=hostua-net-0,id=ua-net-0,mac=52:54:00:8f:c4:a9,bus=pci.0,addr=0x5 \
-netdev tap,fd=35,id=hostua-net-1,vhost=on,vhostfd=36 \
-device virtio-net-pci,netdev=hostua-net-1,id=ua-net-1,mac=52:54:00:2c:db:80,bus=pci.0,addr=0x6 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-vnc 127.0.0.1:0 \
-k en-us \
-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/1 (label charserial0)
2022-04-05T17:08:27.864222Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12]
2022-04-05T17:08:27.864273Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13]
2022-04-05T17:08:59.798062Z qemu-system-x86_64: terminating on signal 15 from pid 4705 (/usr/sbin/libvirtd)
2022-04-05 17:08:59.998+0000: shutting down, reason=destroyed

oomichi commented 2 years ago

/var/log/libvirt/qemu/kubespraykub-1.log

2022-04-05T21:50:48.211837Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12]
2022-04-05T21:50:48.211885Z qemu-system-x86_64: warning: host doesn't support requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13]
2022-04-05T21:51:20.153716Z qemu-system-x86_64: terminating on signal 15 from pid 4705 (/usr/sbin/libvirtd)
2022-04-05 21:51:20.354+0000: shutting down, reason=destroyed

libvirtd kills qemu process. Even if increasing vm memory from 2GB to 8GB on Vagrantfile, there is still the issue.

To investigate the issue, the following line is added to /etc/libvirt/libvirtd.conf

log_filters="1:qemu 1:libvirt 4:object 4:json 1:event 1:util"
log_outputs="1:file:/var/log/libvirt/libvirtd.log"

and sudo systemctl restart libvirtd

oomichi commented 2 years ago

2022-04-05 23:11:50.331+0000: 4329: debug : virThreadJobSet:93 : Thread 4329 (virNetServerHandleJob) is now running job remoteDispatchDomainDestroyFlags
2022-04-05 23:11:50.331+0000: 4329: debug : virDomainDestroyFlags:526 : dom=0x7f04b4007df0, (VM: name=kubespraykub-1, uuid=e380a93d-2144-433c-b1c3-86d3458b5138), flags=0x0
2022-04-05 23:11:50.331+0000: 4329: debug : qemuProcessKill:7197 : vm=0x7f04c021d140 name=kubespraykub-1 pid=4523 flags=0x1
2022-04-05 23:11:50.332+0000: 4324: info : qemuMonitorJSONIOProcessLine:234 : QEMU_MONITOR_RECV_EVENT: mon=0x7f04bc044210 event={"timestamp": {"seconds": 1649200310, "microseconds": 332060}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-signal"}}
2022-04-05 23:11:50.332+0000: 4324: debug : qemuMonitorEmitEvent:1198 : mon=0x7f04bc044210 event=SHUTDOWN
2022-04-05 23:11:50.332+0000: 4324: debug : qemuProcessHandleEvent:549 : vm=0x7f04c021d140
2022-04-05 23:11:50.732+0000: 4329: debug : qemuDomainObjBeginJobInternal:9416 : Starting job: job=destroy agentJob=none asyncJob=none (vm=0x7f04c021d140 name=kubespraykub-1, current job=none agentJob=none async=none)
2022-04-05 23:11:50.732+0000: 4329: debug : qemuDomainObjBeginJobInternal:9470 : Started job: destroy (async=none vm=0x7f04c021d140 name=kubespraykub-1)
2022-04-05 23:11:50.733+0000: 4329: debug : virFileMakePathHelper:2993 : path=/run/libvirt/qemu mode=0777
2022-04-05 23:11:50.733+0000: 4329: debug : virFileClose:110 : Closed fd 25
2022-04-05 23:11:50.733+0000: 4329: debug : qemuProcessStop:7279 : Shutting down vm=0x7f04c021d140 name=kubespraykub-1 id=1 pid=4523, reason=destroyed, asyncJob=none, flags=0x0

oomichi commented 2 years ago

from syslog

Apr  5 23:11:50 dev01 systemd-networkd[607]: vnet0: Link DOWN
Apr  5 23:11:50 dev01 kernel: [ 4427.733221] virbr1: port 2(vnet0) entered disabled state
Apr  5 23:11:50 dev01 kernel: [ 4427.735908] device vnet0 left promiscuous mode
Apr  5 23:11:50 dev01 kernel: [ 4427.735919] virbr1: port 2(vnet0) entered disabled state
Apr  5 23:11:50 dev01 systemd-networkd[607]: vnet0: Lost carrier
Apr  5 23:11:50 dev01 systemd-networkd[607]: rtnl: received neighbor for link '28' we don't know about, ignoring.
Apr  5 23:11:50 dev01 systemd-networkd[607]: rtnl: received neighbor for link '28' we don't know about, ignoring.
Apr  5 23:11:50 dev01 systemd-networkd[607]: virbr1: Lost carrier
Apr  5 23:11:50 dev01 systemd-networkd[607]: vnet1: Link DOWN
Apr  5 23:11:50 dev01 systemd-networkd[607]: vnet1: Lost carrier
Apr  5 23:11:50 dev01 kernel: [ 4427.813238] virbr2: port 2(vnet1) entered disabled state
Apr  5 23:11:50 dev01 kernel: [ 4427.816084] device vnet1 left promiscuous mode
Apr  5 23:11:50 dev01 kernel: [ 4427.816096] virbr2: port 2(vnet1) entered disabled state
Apr  5 23:11:50 dev01 systemd-networkd[607]: rtnl: received neighbor for link '29' we don't know about, ignoring.
Apr  5 23:11:50 dev01 systemd-networkd[607]: rtnl: received neighbor for link '29' we don't know about, ignoring.
Apr  5 23:11:50 dev01 systemd[1]: machine-qemu\x2d1\x2dkubespraykub\x2d1.scope: Succeeded.
Apr  5 23:11:50 dev01 dnsmasq[4468]: exiting on receipt of SIGTERM

That seems due to failure of virtual network confguration?

oomichi commented 2 years ago

debug

$ gnome-boxes --checks

(gnome-boxes:14201): Boxes-WARNING **: 01:29:11.330: util-app.vala:347: Failed to execute child process ?restorecon? (No such file or directory)
• The CPU is capable of virtualization: yes
• The KVM module is loaded: yes
• Libvirt KVM guest available: yes
• Boxes storage pool available: no
    Could not get “gnome-boxes” storage pool information from libvirt. Make sure “virsh -c qemu:///session pool-dumpxml gnome-boxes” is working.
• The SELinux context is default: no

Report bugs to <http://gitlab.gnome.org/gnome/gnome-boxes/issues>.
Boxes home page: <https://wiki.gnome.org/Apps/Boxes>.

due to Boxes storage pool available: no?

oomichi commented 2 years ago

libvirt itself seems to work fine on the machine by trying virt-install:

$ wget http://mirrors.advancedhosters.com/centos/7.9.2009/isos/x86_64/CentOS-7-x86_64-Minimal-2009.iso
$ virt-install --name=centos --location=./CentOS-7-x86_64-Minimal-2009.iso --disk path=/var/lib//libvirt/images/centos.qcow2,size=5,format=qcow2 --vcpus=1 --ram=1024 --graphics none --extra-args="console=tty0 console=ttyS0,115200n8"

another terminal:

$ sudo virsh list --all
 Id   Name     State
------------------------
 2    centos   running

after shutdown now on the vm:

$ sudo virsh list --all
 Id   Name     State
-------------------------
 -    centos   shut off

oomichi commented 2 years ago

There are some files which seem related to systemd-networkd error messages:

# ls /var/lib/libvirt/dnsmasq/ 
default.addnhosts  default.conf  default.hostsfile  kubespray1.addnhosts  kubespray1.conf  kubespray1.hostsfile  virbr0.macs  virbr0.status  virbr3.macs

Let's remove all files under the path, then try vagrant up

$ sudo rm /var/lib/libvirt/dnsmasq/*
$ vagrant up
...
TASK [bootstrap-os : Fetch /etc/os-release] ************************************
fatal: [k8s-1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.121.181 port 22: No route to host", "unreachable": true}
...

The same error still happens. The vm was temporary created but deleted soon.

$ sudo virsh list --all
 Id   Name             State
---------------------------------
 3    kubespraykub-1   running
 -    centos           shut off
$
$ sudo virsh list --all
 Id   Name     State
-------------------------
 -    centos   shut off

oomichi commented 2 years ago

$ cat 00-installer-config.yaml
# This is the network config written by 'subiquity'
network:
  version: 2
  ethernets:
    eno1:
      dhcp4: no
      addresses: [192.168.1.92/24]
      gateway4: 192.168.1.254
      nameservers:
        addresses: [192.168.1.254]
        search: []
  wifis:
    wlp58s0:
      dhcp4: no
      addresses: [192.168.1.91/24]
      gateway4: 192.168.1.254
      nameservers:
        addresses: [192.168.1.254]
        search: []
      access-points:
        "xxxxxxxxxxxxxxxxxxxxxxxxxx":
          password: "xxxxxxxxxxxxx"

oomichi commented 2 years ago

After re-installing ubuntu again, the issue has been solved..

oomichi commented 2 years ago

By changing Vagrantfile like the following, k8s cluster is deployed.

@@ -256,7 +256,7 @@ Vagrant.configure("2") do |config|
           ansible.host_key_checking = false
           ansible.raw_arguments = ["--forks=#{$num_instances}", "--flush-cache", "-e ansible_become_pass=vagrant"]
           ansible.host_vars = host_vars
-          ansible.tags = ['facts']
+          #ansible.tags = ['facts']
           ansible.groups = {
             "etcd" => ["#{$instance_name_prefix}-[1:#{$etcd_instances}]"],
             "kube_control_plane" => ["#{$instance_name_prefix}-[1:#{$kube_master_instances}]"],

oomichi / try-kubernetes

vagrant: failed to deploy a vm from `vagrant up` #118