Closed tomkukral closed 2 years ago
I think you need to enable emulation in the KubeVirt CR config: https://github.com/kubevirt/kubevirt/blob/main/docs/software-emulation.md. Run:
kubectl -n kubevirt edit kubevirt kubevirt
and try to add
spec:
configuration:
developerConfiguration:
useEmulation: true
Hitting the same on a bare metal 1.24.2 kubeadm cluster (aside from the KVM initialization error):
error: failed to get emulator capabilities
error: internal error: Failed to start QEMU binary /usr/libexec/qemu-kvm for probing: **
ERROR:../accel/accel-softmmu.c:82:accel_init_ops_interfaces: assertion failed: (ops != NULL)
It seems it might even need them even if it's not actively going to be using them?
Been a while since I've actively made use of this but on the same physical systems (just older versions of kernel, cluster and kubevirt) and with the same deployment configuration it has worked without issues before.
I think the QEMU update in 0.53.0 is relevant but I've not had the time to try older KubeVirt versions.
EDIT: Yup it comes online just fine with 0.52.0
I think you need to enable emulation in the KubeVirt CR config: https://github.com/kubevirt/kubevirt/blob/main/docs/software-emulation.md. Run:
kubectl -n kubevirt edit kubevirt kubevirt
and try to add
spec: configuration: developerConfiguration: useEmulation: true
Yes, emulation is enabled on my cluster
k -n kubevirt get kubevirt kubevirt -o yaml
apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
annotations:
app.kubernetes.io/name: kubevirt
app.kubernetes.io/version: 20220725-2857
kubevirt.io/latest-observed-api-version: v1
kubevirt.io/storage-observed-api-version: v1alpha3
creationTimestamp: "2022-07-25T09:55:49Z"
finalizers:
- foregroundDeleteKubeVirt
generation: 7
name: kubevirt
namespace: kubevirt
resourceVersion: "4406"
selfLink: /apis/kubevirt.io/v1/namespaces/kubevirt/kubevirts/kubevirt
uid: 5dcfa293-3f57-43d4-98af-5eb1aa8b666d
spec:
certificateRotateStrategy: {}
configuration:
developerConfiguration:
useEmulation: true
Yeah, that now looks more like an issue with qemu 'modularization'. I guess some libraries (i.e. for tcg) are now packaged in a separate RPM which is not pulled as dependency into the virt-launcher container. Therefore the dynamic loading of tcg fails. Just a thought...
Ping @andreabolognani , @rmohr , WDYT?
... on the other hand, emulation mode should be tested in CI. therefore not sure...
Yeah, that now looks more like an issue with qemu 'modularization'. I guess some libraries (i.e. for tcg) are now packaged in a separate RPM which is not pulled as dependency into the virt-launcher container. Therefore the dynamic loading of tcg fails. Just a thought...
Ping @andreabolognani , @rmohr , WDYT?
... on the other hand, emulation mode should be tested in CI. therefore not sure...
@vasiliy-ul thanks a lot for the heads up!
I don't think the issue is related to QEMU modularization, as TCG support is part of the qemu-kvm-core
package which we include in the image.
From inside a virt-launcher pod (HEAD
points to f77d50591ddd0f74c0c876e38fdf14ca3fe54be8 here):
sh-4.4# ls -l /usr/lib64/qemu-kvm/accel-tcg-x86_64.so
-rwxr-xr-x. 1 root root 24832 Jan 1 1970 /usr/lib64/qemu-kvm/accel-tcg-x86_64.so
sh-4.4# /usr/libexec/qemu-kvm -M q35 -accel tcg
VNC server running on ::1:5900
So TCG support is present and appears to be working.
@tomkukral are you running an upstream build of KubeVirt or a downstream one? If the latter, there might be some downstream packaging decision affecting the behavior.
Yeah, that now looks more like an issue with qemu 'modularization'. I guess some libraries (i.e. for tcg) are now packaged in a separate RPM which is not pulled as dependency into the virt-launcher container. Therefore the dynamic loading of tcg fails. Just a thought... Ping @andreabolognani , @rmohr , WDYT? ... on the other hand, emulation mode should be tested in CI. therefore not sure...
@vasiliy-ul thanks a lot for the heads up!
I don't think the issue is related to QEMU modularization, as TCG support is part of the
qemu-kvm-core
package which we include in the image.From inside a virt-launcher pod (
HEAD
points to f77d505 here):sh-4.4# ls -l /usr/lib64/qemu-kvm/accel-tcg-x86_64.so -rwxr-xr-x. 1 root root 24832 Jan 1 1970 /usr/lib64/qemu-kvm/accel-tcg-x86_64.so sh-4.4# /usr/libexec/qemu-kvm -M q35 -accel tcg VNC server running on ::1:5900
So TCG support is present and appears to be working.
@tomkukral are you running an upstream build of KubeVirt or a downstream one? If the latter, there might be some downstream packaging decision affecting the behavior.
I'm running upstream build and using image quay.io/kubevirt/virt-launcher:v0.54.0
. Do you want me to run some testing on my site?
Everything is working in case of physical HW and it is broken only on AWS.
I'm running upstream build and using image
quay.io/kubevirt/virt-launcher:v0.54.0
.
Okay, that should rule out downstream-specific issues.
Do you want me to run some testing on my site?
Everything is working in case of physical HW and it is broken only on AWS.
More information would be excellent, thanks!
You could start by verifying that a minimal VM (such as the one defined in examples/vmi-nocloud.yml
triggers the issue.
Then you could collect the debug logs produced by adding
metadata:
labels:
debugLogs: "true"
to the VMI definition both on physical hardware and AWS. Information about the specific type of AWS instance could be useful as well.
@Omar007 mentioned that v0.52.0 works on AWS, so if either one of you could provide the logs for both a successful run on v0.52.0 and a failed run on v0.54.0 that would be great.
I'll try to ping a few QEMU developers to see whether the error message rings any bell for them.
I'm sorry for late response ... I'll boostrap kubevirt this week and send debug logs.
I have the same error. I tested kubevirt on old nodes and everything worked, I switched to the new nodes and there are errors. The same Kubernetes Cluster - just different node.
error: 2022-08-24T23:12:59.210342512+02:00 failed to get emulator capabilities
error: internal error: Failed to start QEMU binary /usr/libexec/qemu-kvm for probing: **
ERROR:../accel/accel-softmmu.c:82:accel_init_ops_interfaces: assertion failed: (ops != NULL)
2022-08-24T23:12:59.210342512+02:00
Hi, is this something you can somehow reproduce with qemu running outside of kubevirt? I could try to reproduce but it would be easier for me to do without kubevirt in the mix.
Generally if ops is NULL at that point, it would mean that the accelerator has not provided its interface to register to QEMU.
Something is going wrong with the initialization order of things, or the tcg .so module is not registering/working correctly. There might be a difference between having the .so module, and having the code built-in in the qemu binary.
What is the exact version of QEMU, and can you pinpoint roughly when it started failing?
Btw I see that you have: /usr/lib64/qemu-kvm/accel-tcg-x86_64.so . Have you tried configuring qemu with TCG built-in instead of a module?
Ciao,
Claudio
@philmd
I am not even sure it is possible to explicitly build tcg as built-in anymore, if --enable-modules is true. This has been finalized IIRC in qemu-6.1 (Gerd, Paolo) as RH needed it quickly, but in my view the work on tcg modularization was not concluded yet, ie there is still the whole problem of tcg_available() vs tcg_enabled() unclear distinction that was never as far as I know brought to conclusion.
I have created this vmi:
---
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
labels:
special: vmi-nocloud
debugLogs: "true"
name: vmi-nocloud
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
- disk:
bus: virtio
name: emptydisk
resources:
requests:
memory: 128Mi
terminationGracePeriodSeconds: 0
volumes:
- containerDisk:
image: registry:5000/kubevirt/cirros-container-disk-demo:devel
name: containerdisk
- cloudInitNoCloud:
userData: |
#!/bin/sh
echo 'printed from cloud-init userdata'
name: cloudinitdisk
- emptyDisk:
capacity: 2Gi
name: emptydisk
but pod is not starting
ip-192-168-32-141 tom-2022-08-31-140632 ~/debug k -n ves-system describe po virt-launcher-vmi-nocloud-dgff4
Name: virt-launcher-vmi-nocloud-dgff4
Namespace: ves-system
Priority: 0
Node: <none>
Labels: debugLogs=true
kubevirt.io=virt-launcher
kubevirt.io/created-by=a90df346-0466-4412-a06e-d2514cdb373d
special=vmi-nocloud
vm.kubevirt.io/name=vmi-nocloud
Annotations: kubectl.kubernetes.io/default-container: compute
kubevirt.io/domain: vmi-nocloud
kubevirt.io/migrationTransportUnix: true
post.hook.backup.velero.io/command: ["/usr/bin/virt-freezer", "--unfreeze", "--name", "vmi-nocloud", "--namespace", "ves-system"]
post.hook.backup.velero.io/container: compute
pre.hook.backup.velero.io/command: ["/usr/bin/virt-freezer", "--freeze", "--name", "vmi-nocloud", "--namespace", "ves-system"]
pre.hook.backup.velero.io/container: compute
ves.io/pod-id: bdf6c786-b950-4b76-ac80-6b3675c17eea
Status: Pending
IP:
IPs: <none>
Controlled By: VirtualMachineInstance/vmi-nocloud
Init Containers:
container-disk-binary:
Image: gcr.io/volterraio/kubevirt-launcher@sha256:c9f1b45c79b4c1fd0e2041d9997a838c85bf9418c06116531e00733aa6c06230
Port: <none>
Host Port: <none>
Command:
/usr/bin/cp
/usr/bin/container-disk
/init/usr/bin/container-disk
Limits:
cpu: 100m
memory: 40M
Requests:
cpu: 10m
memory: 1M
Environment: <none>
Mounts:
/init/usr/bin from virt-bin-share-dir (rw)
volumecontainerdisk-init:
Image: registry:5000/kubevirt/cirros-container-disk-demo:devel
Port: <none>
Host Port: <none>
Command:
/usr/bin/container-disk
Args:
--no-op
Limits:
cpu: 100m
memory: 40M
Requests:
cpu: 10m
ephemeral-storage: 50M
memory: 1M
Environment: <none>
Mounts:
/usr/bin from virt-bin-share-dir (rw)
/var/run/kubevirt-ephemeral-disks/container-disk-data/a90df346-0466-4412-a06e-d2514cdb373d from container-disks (rw)
Containers:
compute:
Image: gcr.io/volterraio/kubevirt-launcher@sha256:c9f1b45c79b4c1fd0e2041d9997a838c85bf9418c06116531e00733aa6c06230
Port: <none>
Host Port: <none>
Command:
/usr/bin/virt-launcher-monitor
--qemu-timeout
258s
--name
vmi-nocloud
--uid
a90df346-0466-4412-a06e-d2514cdb373d
--namespace
ves-system
--kubevirt-share-dir
/var/run/kubevirt
--ephemeral-disk-dir
/var/run/kubevirt-ephemeral-disks
--container-disk-dir
/var/run/kubevirt/container-disks
--grace-period-seconds
15
--hook-sidecars
0
--ovmf-path
/usr/share/OVMF
--allow-emulation
Limits:
devices.kubevirt.io/tun: 1
Requests:
cpu: 100m
devices.kubevirt.io/tun: 1
ephemeral-storage: 50M
memory: 348416Ki
Environment:
LIBVIRT_DEBUG_LOGS: 1
POD_NAME: virt-launcher-vmi-nocloud-dgff4 (v1:metadata.name)
Mounts:
/var/run/kubevirt from public (rw)
/var/run/kubevirt-ephemeral-disks from ephemeral-disks (rw)
/var/run/kubevirt-private from private (rw)
/var/run/kubevirt/container-disks from container-disks (rw)
/var/run/kubevirt/hotplug-disks from hotplug-disks (rw)
/var/run/kubevirt/sockets from sockets (rw)
/var/run/libvirt from libvirt-runtime (rw)
/volterra/secrets/identity from certs-volume (rw)
volumecontainerdisk:
Image: registry:5000/kubevirt/cirros-container-disk-demo:devel
Port: <none>
Host Port: <none>
Command:
/usr/bin/container-disk
Args:
--copy-path
/var/run/kubevirt-ephemeral-disks/container-disk-data/a90df346-0466-4412-a06e-d2514cdb373d/disk_0
Limits:
cpu: 100m
memory: 40M
Requests:
cpu: 10m
ephemeral-storage: 50M
memory: 1M
Environment: <none>
Mounts:
/usr/bin from virt-bin-share-dir (rw)
/var/run/kubevirt-ephemeral-disks/container-disk-data/a90df346-0466-4412-a06e-d2514cdb373d from container-disks (rw)
/volterra/secrets/identity from certs-volume (rw)
wingman:
Image: gcr.io/volterraio/wingman@sha256:e587af1d1f4394a456361fda3ab0be16671b81d2900fb2db402ae3d12784e164
Port: <none>
Host Port: <none>
Command:
wingmand
--config
/volterra/config/wingman.yml
Limits:
cpu: 50m
memory: 100Mi
Requests:
cpu: 5m
memory: 70Mi
Environment:
SECURITY_DOC: CAESzA...REDACTED
POD_IP: (v1:status.podIP)
POD_NAME: virt-launcher-vmi-nocloud-dgff4 (v1:metadata.name)
Mounts:
/volterra/config/wingman.yml from wingman-config (rw,path="wingman.yml")
/volterra/secrets/identity from certs-volume (rw)
Readiness Gates:
Type Status
kubevirt.io/virtual-machine-unpaused True
Conditions:
Type Status
PodScheduled False
kubevirt.io/virtual-machine-unpaused True
Volumes:
private:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
public:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
sockets:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
virt-bin-share-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
libvirt-runtime:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
ephemeral-disks:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
container-disks:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
hotplug-disks:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
certs-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 5M
wingman-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: wingman-config
Optional: false
QoS Class: Burstable
Node-Selectors: kubevirt.io/schedulable=true
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 49s (x7 over 5m55s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector.
node don't have any kubelet related labels (pod requires kubevirt.io/schedulable: "true"
) because virt-launcher is failing and it doesn't set node label.
ip-192-168-32-141 tom-2022-08-31-140632 ~/debug k -n kubevirt logs virt-handler-rqfkw -c virt-launcher
error: failed to get emulator capabilities
error: internal error: Failed to start QEMU binary /usr/libexec/qemu-kvm for probing: Could not access KVM kernel module: No such file or directory
qemu-kvm: failed to initialize kvm: No such file or directory
qemu-kvm: falling back to tcg
**
ERROR:../accel/accel-softmmu.c:82:accel_init_ops_interfaces: assertion failed: (ops != NULL)
Right, now the challenge is how to debug qemu inside kubevirt. Is there any recommended way to do it in kubevirt? Otherwise your best bet would be to reproduce without kubevirt involved.
I'm able to reproduce by running node-labeller.sh
directly in docker container
docker run --rm -ti --entrypoint /bin/sh --privileged -v $(mktemp -d):/var/lib/kubevirt-node-labeller gcr.io/volterraio/kubevirt-launcher@sha256:c9f1b45c79b4c1fd0e2041d9997a838c85bf9418c06116531e00733aa6c06230 -c node-labeller.sh
Authorization not available. Check if polkit service is running or see debug message for more information.
error: failed to get emulator capabilities
error: internal error: Failed to start QEMU binary /usr/libexec/qemu-kvm for probing: Could not access KVM kernel module: No such file or directory
qemu-kvm: failed to initialize kvm: No such file or directory
qemu-kvm: falling back to tcg
**
ERROR:../accel/accel-softmmu.c:82:accel_init_ops_interfaces: assertion failed: (ops != NULL)
btw image is same as quay.io/kubevirt/virt-launcher:v0.54.0
, just need to repush it out repository due to limited network
I can provide access to this lab, just ping me on kubernetes slack (same username as on github)
It can be AWS specific because same deployment works fine on GCP.
Comparing kubevirt on AWS and GCP:
kvm
module loaded, I have tried to rmmod
it in aws but it was not helping
# AWS
ip-192-168-32-141 tom-2022-08-31-140632 ~ docker run --rm -ti --entrypoint /bin/sh --privileged -v $(mktemp -d):/var/lib/kubevirt-node-labeller gcr.io/volterraio/kubevirt-launcher@sha256:c9f1b45c79b4c1fd0e2041d9997a838c85bf9418c06116531e00733aa6c06230 -c node-labeller.sh
Authorization not available. Check if polkit service is running or see debug message for more information.
error: failed to get emulator capabilities
error: internal error: Failed to start QEMU binary /usr/libexec/qemu-kvm for probing: Could not access KVM kernel module: No such file or directory
qemu-kvm: failed to initialize kvm: No such file or directory
qemu-kvm: falling back to tcg
**
ERROR:../accel/accel-softmmu.c:82:accel_init_ops_interfaces: assertion failed: (ops != NULL)
kubevirt-staging-test01 kubevirt-test01 ~ docker run --rm -ti --entrypoint /bin/sh --privileged -v $(mktemp -d):/var/lib/kubevirt-node-labeller gcr.io/volterraio/kubevirt-launcher@sha256:c9f1b45c79b4c1fd0e2041d9997a838c85bf9418c06116531e00733aa6c06230 -c node-labeller.sh
Authorization not available. Check if polkit service is running or see debug message for more information. error: failed to get emulator capabilities error: internal error: Failed to start QEMU binary /usr/libexec/qemu-kvm for probing: Could not access KVM kernel module: No such file or directory qemu-kvm: failed to initialize kvm: No such file or directory qemu-kvm: falling back to tcg ** ERROR:../accel/accel-softmmu.c:82:accel_init_ops_interfaces: assertion failed: (ops != NULL)
Authorization not available. Check if polkit service is running or see debug message for more information. Authorization not available. Check if polkit service is running or see debug message for more information. Authorization not available. Check if polkit service is running or see debug message for more information. Authorization not available. Check if polkit service is running or see debug message for more information.
so it isn't failing on gcp (but `tcg` error is still there). Kubevirt labels are present on gcp node.
* systemd is trying to detect kvm on aws but not on gcp
ip-192-168-32-141 tom-2022-08-31-140632 ~ dmesg | grep kvm [ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00 [ 0.000000] kvm-clock: cpu 0, msr 353e01001, primary cpu clock [ 0.000000] kvm-clock: using sched offset of 4644534413 cycles [ 0.000000] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns [ 0.000000] kvm-stealtime: cpu 0, msr 42402c080 [ 0.001000] kvm-clock: cpu 1, msr 353e01041, secondary cpu clock [ 0.005046] kvm-stealtime: cpu 1, msr 4240ac080 [ 0.001000] kvm-clock: cpu 2, msr 353e01081, secondary cpu clock [ 0.006960] kvm-stealtime: cpu 2, msr 42412c080 [ 0.001000] kvm-clock: cpu 3, msr 353e010c1, secondary cpu clock [ 0.008313] kvm-stealtime: cpu 3, msr 4241ac080 [ 0.113042] clocksource: Switched to clocksource kvm-clock [ 1.020646] systemd[1]: Detected virtualization kvm. [ 5374.086836] kvm: no hardware support [ 5462.888528] kvm: no hardware support [ 5472.877271] kvm: no hardware support ip-192-168-32-141 tom-2022-08-31-140632 ~ uname -a Linux ip-192-168-32-141.us-east-2.compute.internal 4.18.0-240.10.1.ves1.el7.x86_64 #1 SMP Tue Mar 30 15:02:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
kubevirt-staging-test01 kubevirt-test01 ~ dmesg | grep kvm kubevirt-staging-test01 kubevirt-test01 ~ uname -a Linux kubevirt-staging-test01 4.18.0-147.5.1.ves6.el7.x86_64 #1 SMP Mon Aug 31 09:14:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
* kernel version is bit different, I'll try to sync versions
@tomkukral, could try to run the same stuff with docker but using another image:registry.opensuse.org/kubevirt/virt-launcher:0.55.0
? If the issue is reproducible, we can probably debug it further from there.
registry.opensuse.org/kubevirt/virt-launcher:0.55.0
It works with this image. Let me try to upgrade kubevirt to 0.55 and test again.
I have also discovered gcp test lab was using much older kernel so I'm trying to downgrade aws to same version.
interesting, seems a CentOS-only kubevirt images problem? I wonder why upstream kubevirt does not use upstream qemu... @fabiand (ciao Fabian)
Comparing opensuse build and upstream on same version
ip-192-168-32-141 tom-2022-08-31-140632 ~ docker run --rm -ti --entrypoint /bin/sh --privileged -v $(mktemp -d):/var/lib/kubevirt-node-labeller registry.opensuse.org/kubevirt/virt-launcher:0.55.0 -c /usr/bin/node-labeller.sh
ip-192-168-32-141 tom-2022-08-31-140632 ~ docker run --rm -ti --entrypoint /bin/sh --privileged -v $(mktemp -d):/var/lib/kubevirt-node-labeller quay.io/kubevirt/virt-launcher:v0.55.0 -c /usr/bin/node-labeller.sh
Unable to find image 'quay.io/kubevirt/virt-launcher:v0.55.0' locally
v0.55.0: Pulling from kubevirt/virt-launcher
ebec1dc3291e: Pull complete
49701e25b80f: Pull complete
Digest: sha256:43f223a6bf9c40cc86408d9acb49dd3bd95c87f09a120dab90f367547c31c792
Status: Downloaded newer image for quay.io/kubevirt/virt-launcher:v0.55.0
Authorization not available. Check if polkit service is running or see debug message for more information.
error: failed to get emulator capabilities
error: internal error: Failed to start QEMU binary /usr/libexec/qemu-kvm for probing: Could not access KVM kernel module: No such file or directory
qemu-kvm: failed to initialize kvm: No such file or directory
qemu-kvm: falling back to tcg
**
ERROR:../accel/accel-softmmu.c:82:accel_init_ops_interfaces: assertion failed: (ops != NULL)
Yeah, looks like something is different in the way qemu is built. And the issue seems to happen only when kvm probe fails and it falls back to tcg. At least /usr/libexec/qemu-kvm -M q35 -accel tcg
appears to work.
/usr/libexec/qemu-kvm -M q35 -accel tcg
Yes this works
ip-192-168-32-141 tom-2022-08-31-140632 ~ docker run --rm -ti --entrypoint /bin/sh --privileged -v $(mktemp -d):/var/lib/kubevirt-node-labeller quay.io/kubevirt/virt-launcher:v0.54.0
sh-4.4# /usr/libexec/qemu-kvm -M q35 -accel tcg
VNC server running on 127.0.0.1:5900
Sharing some status updates: it seems that the behavior depends on the host system. I can reproduce the error when running the docker command on my laptop with a recent Tumbleweed. But it's not reproducible on CentOS 8 Stream (from make cluster-up
) and on some other older distros.
I can reproduce error running upstream docker image 0.54 but no error is there with opensuse container (bot running on same operation system). I was using same vm for both. Maybe it can be combination of host os and container OS.
If it helps, I am able to reproduce this on a physical CentOS 7 host with kubevirt upstream virt-launcher v0.55.0 image. And saw somebody else filed another issue (https://github.com/kubevirt/kubevirt/issues/8362) for Ubuntu 20.04 as well.
Could there be an issue due to API version mismatch between qemu-kvm (v6.2) and virsh (qemu 8.0) versions?
sh-4.4# /usr/libexec/qemu-kvm --version
QEMU emulator version 6.2.0 (qemu-kvm-6.2.0-5.module_el8.6.0+1087+b42c8331)
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
sh-4.4# virsh version
Authorization not available. Check if polkit service is running or see debug message for more information.
Compiled against library: libvirt 8.0.0
Using library: libvirt 8.0.0
Using API: QEMU 8.0.0
error: failed to get the hypervisor version
error: internal error: Cannot find suitable emulator for x86_64
@tomkukral, @poojaghumre, could you check on your side the permissions of the directory (using docker run ...
with the upstream image)?
# ls -la /usr/lib64/qemu-kvm
After that run
# chmod +rx /usr/lib64/qemu-kvm
# /usr/bin/node-labeler.sh
@vasiliy-ul I confirmed that your suggested fix works just fine. I modified the virt-handler (v0.55.0) daemonset config as below and that worked:
initContainers:
- args:
- chmod +rx /usr/lib64/qemu-kvm; node-labeller.sh;
command:
- /bin/sh
- -c
image: quay.io/kubevirt/virt-launcher:v0.55.0
Permissions inside virt-launcher container when using v0.55.0 image as is:
sh-4.4# ls -la /usr/lib64/qemu-kvm
total 296
drw-------. 2 root root 4096 Sep 1 02:21 .
sh-4.4# /usr/bin/node-labeller.sh
Authorization not available. Check if polkit service is running or see debug message for more information.
error: failed to get emulator capabilities
error: internal error: Failed to start QEMU binary /usr/libexec/qemu-kvm for probing: **
ERROR:../accel/accel-softmmu.c:82:accel_init_ops_interfaces: assertion failed: (ops != NULL)
sh-4.4# chmod +rx /usr/lib64/qemu-kvm
sh-4.4# ls -la /usr/lib64/qemu-kvm
total 296
drwxr-xr-x. 2 root root 4096 Sep 1 02:21 .
sh-4.4# ./node-labeller.sh
Authorization not available. Check if polkit service is running or see debug message for more information.
Authorization not available. Check if polkit service is running or see debug message for more information.
Authorization not available. Check if polkit service is running or see debug message for more information.
Authorization not available. Check if polkit service is running or see debug message for more information.
sh-4.4# echo $?
0
When querying the host capabilities (inside the node-labeller.sh
script), libvirtd
starts qemu using qemu
user 107:107
. The wrong permissions prevent a non-root
user from accessing the tcg accel plugin. Definitely there is some space for improvement in the qemu error reporting IMHO.
Now the question is why the permissions get screwed. The directory /usr/lib64/qemu-kvm
comes from the base image. I do not have a reasonable explanation why it works fine on some hosts while fails on the others. This does not seem to depend on the host OS. I checked on two different machines with the same OS installed. One works fine, while the other show this error because of the permissions.
For the sake of further investigation, @poojaghumre, @tomkukral, could you also share the output of
docker info | grep Storage
I can confirm chmod is helping. Thanks a lot for your help!
docker run --rm -ti --entrypoint /bin/sh --privileged -v $(mktemp -d):/var/lib/kubevirt-node-labeller quay.io/kubevirt/virt-launcher:v0.54.0
sh-4.4# ls -la /usr/lib64/qemu-kvm
total 296
drw-------. 2 root root 4096 Sep 1 08:33 .
dr-xr-xr-x. 29 root root 16384 Jan 1 1970 ..
-rwxr-xr-x. 1 root root 11792 Jan 1 1970 accel-qtest-x86_64.so
-rwxr-xr-x. 1 root root 24832 Jan 1 1970 accel-tcg-x86_64.so
-rwxr-xr-x. 1 root root 7568 Jan 1 1970 hw-display-virtio-gpu-gl.so
-rwxr-xr-x. 1 root root 7576 Jan 1 1970 hw-display-virtio-gpu-pci-gl.so
-rwxr-xr-x. 1 root root 12688 Jan 1 1970 hw-display-virtio-gpu-pci.so
-rwxr-xr-x. 1 root root 53792 Jan 1 1970 hw-display-virtio-gpu.so
-rwxr-xr-x. 1 root root 7568 Jan 1 1970 hw-display-virtio-vga-gl.so
-rwxr-xr-x. 1 root root 17368 Jan 1 1970 hw-display-virtio-vga.so
-rwxr-xr-x. 1 root root 47688 Jan 1 1970 hw-usb-host.so
-rwxr-xr-x. 1 root root 67584 Jan 1 1970 hw-usb-redirect.so
sh-4.4# chmod +rx /usr/lib64/qemu-kvm
sh-4.4# /usr/bin/node-labeller.sh
Authorization not available. Check if polkit service is running or see debug message for more information.
Authorization not available. Check if polkit service is running or see debug message for more information.
Authorization not available. Check if polkit service is running or see debug message for more information.
Authorization not available. Check if polkit service is running or see debug message for more information.
Docker is using devicemapper
in this site
ip-192-168-32-141 tom-2022-08-31-140632 ~ docker info
Client:
Debug Mode: false
Server:
Containers: 134
Running: 109
Paused: 0
Stopped: 25
Images: 100
Server Version: 19.03.12
Storage Driver: devicemapper
Pool Name: docker-253:0-6302274-pool
Pool Blocksize: 65.54kB
Base Device Size: 10.74GB
Backing Filesystem: xfs
Udev Sync Supported: true
Data file: /dev/loop0
Metadata file: /dev/loop1
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Data Space Used: 19.14GB
Data Space Total: 107.4GB
Data Space Available: 16.8GB
Metadata Space Used: 34.79MB
Metadata Space Total: 2.147GB
Metadata Space Available: 2.113GB
Thin Pool Minimum Free Space: 10.74GB
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.170-RHEL7 (2020-03-24)
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.18.0-240.10.1.ves1.el7.x86_64
Operating System: CentOS Linux 7.2009.29 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.29GiB
Name: ip-192-168-32-141.us-east-2.compute.internal
ID: LRJZ:ZWSF:WJ2Q:5UMP:7OIL:F72F:VHZE:GNZO:PUGB:ATN6:IDUF:6R7Y
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: true
WARNING: bridge-nf-call-ip6tables is disabled
WARNING: the devicemapper storage-driver is deprecated, and will be removed in a future release.
WARNING: devicemapper: usage of loopback devices is strongly discouraged for production use.
Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
It was working previously on gcp which is using overlay2
kubevirt-staging-test01 kubevirt-test01 ~ docker info
Client:
Debug Mode: false
Server:
Containers: 133
Running: 107
Paused: 0
Stopped: 26
Images: 94
Server Version: 19.03.12
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.18.0-147.5.1.ves6.el7.x86_64
Operating System: CentOS Linux 7.2006.3 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.66GiB
Name: kubevirt-staging-test01
ID: LOQW:UZR4:BLKJ:RREK:D7SV:45TX:L3QM:TGM2:54CF:I2DY:UTTS:PMZB
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: true
WARNING: bridge-nf-call-ip6tables is disabled
Thanks for sharing the info. The assumption is that the behavior varies depending on the storage driver used. It seems that the container gets correct permissions with overlay2
. I see wrong perms with btrfs
. Now also devicemapper
is confirmed to misbehave.
I have already fixed it in my downsteam build and I'm ready to provide any information to debug this further.
@tomkukral, what is your Kubernetes distro? Especially what container runtime is it based on (I assume docker is just for local testing)?
@tomkukral, what is your Kubernetes distro? Especially what container runtime is it based on (I assume docker is just for local testing)?
Using docker container runtime, custom kubernetes installation.
Here is the output of docker info storage command on my centos 7 server:
[root@kubevirt-c7 ~]# docker info | grep Storage
Storage Driver: devicemapper
Can the permission fix be added to node-labeller.sh script itself to unblock this issue, if the permissions are correct in base image?
I am afraid that will be just a workaround if we explicitly adjust the permissions. Other directories might be affected as well. Besides, the virt-launcher
base image is used to run other containers (and those do not call node-labeller.sh
).
Ping @rmohr, @xpivarc, maybe you have some thoughts on that? In short, the issue is the following:
In the virt-launcher
image, there is a directory /usr/lib64/qemu-kvm
which contains .so
files (i.e. qemu module drivers). For unknown reasons, this directory gets wrong permissions, so it is only accessible by the root
user: drw-------. 2 root root
. This breaks the capabilities querying in the node-labeller.sh
script. According to the observations, the issue happens only when docker
is used as the runtime, and it is set up to use either btrfs
or devicemapper
storage drivers (with overlay2
it works just fine). I would suspect a bug in the code which unpacks the tarball.
The directory /usr/lib64/qemu-kvm
appears to be 'not owned' by any package from the rpmtree. I checked what bazeldnf does and was not able to find issues with that, though. It simply creates the full path with the default permissions when handling the files in that directory. The expectation is that by default it should have 0755
as most of the remaining directories.
Interesting problem and great findings. Can you share the mount
output on both the working and not working setups? Also please check the underlayer if possible.
Yeah, I also checked the unpacked filesystem on the host, and it already has wrong perms:
$ docker run --rm -ti --entrypoint /bin/bash quay.io/kubevirt/virt-launcher:v0.55.0
bash-4.4# mount
/dev/nvme0n1p2 on / type btrfs (rw,relatime,ssd,space_cache,subvolid=35753,subvol=/@/var/lib/docker/btrfs/subvolumes/883a8dcbd06749e0b7a3aed62b3b961d2c589078957a7138e6f2442e3de9c2b5)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup type cgroup2 (ro,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k,inode64)
/dev/nvme0n1p2 on /etc/resolv.conf type btrfs (rw,relatime,ssd,space_cache,subvolid=257,subvol=/@/var)
/dev/nvme0n1p2 on /etc/hostname type btrfs (rw,relatime,ssd,space_cache,subvolid=257,subvol=/@/var)
/dev/nvme0n1p2 on /etc/hosts type btrfs (rw,relatime,ssd,space_cache,subvolid=257,subvol=/@/var)
devpts on /dev/console type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
proc on /proc/bus type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/fs type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/irq type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime)
tmpfs on /proc/asound type tmpfs (ro,relatime,inode64)
tmpfs on /proc/acpi type tmpfs (ro,relatime,inode64)
tmpfs on /proc/kcore type tmpfs (rw,nosuid,size=65536k,mode=755,inode64)
tmpfs on /proc/keys type tmpfs (rw,nosuid,size=65536k,mode=755,inode64)
tmpfs on /proc/latency_stats type tmpfs (rw,nosuid,size=65536k,mode=755,inode64)
tmpfs on /proc/timer_list type tmpfs (rw,nosuid,size=65536k,mode=755,inode64)
tmpfs on /proc/scsi type tmpfs (ro,relatime,inode64)
tmpfs on /sys/firmware type tmpfs (ro,relatime,inode64)
# ls -la /var/lib/docker/btrfs/subvolumes/883a8dcbd06749e0b7a3aed62b3b961d2c589078957a7138e6f2442e3de9c2b5/usr/lib64/qemu-kvm/
total 272
drw------- 1 root root 466 Sep 5 08:12 .
dr-xr-xr-x 1 root root 14218 Jan 1 1970 ..
-rwxr-xr-x 1 root root 11792 Jan 1 1970 accel-qtest-x86_64.so
-rwxr-xr-x 1 root root 24832 Jan 1 1970 accel-tcg-x86_64.so
-rwxr-xr-x 1 root root 7568 Jan 1 1970 hw-display-virtio-gpu-gl.so
-rwxr-xr-x 1 root root 7576 Jan 1 1970 hw-display-virtio-gpu-pci-gl.so
-rwxr-xr-x 1 root root 12688 Jan 1 1970 hw-display-virtio-gpu-pci.so
-rwxr-xr-x 1 root root 53792 Jan 1 1970 hw-display-virtio-gpu.so
-rwxr-xr-x 1 root root 7568 Jan 1 1970 hw-display-virtio-vga-gl.so
-rwxr-xr-x 1 root root 17368 Jan 1 1970 hw-display-virtio-vga.so
-rwxr-xr-x 1 root root 47688 Jan 1 1970 hw-usb-host.so
-rwxr-xr-x 1 root root 67584 Jan 1 1970 hw-usb-redirect.so
Same on the working setup with docker+overlay2:
# docker run --rm -ti --entrypoint /bin/bash quay.io/kubevirt/virt-launcher:v0.55.0rt-launcher:v0.55.0
bash-4.4# mount
overlay on / type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/NGSTBJZW5TKGLHETYCLMTZ47K7:/var/lib/docker/overlay2/l/7CPUWR24G3LAFSLIGXHLPHOK4T:/var/lib/docker/overlay2/l/UW2KWGUELOV2IF2YXD4AS6KFS4,upperdir=/var/lib/docker/overlay2/62915a1dc9bfcabc69ee2e5e04ff32c27cc24f75a8c497998b2b0b32a0379cac/diff,workdir=/var/lib/docker/overlay2/62915a1dc9bfcabc69ee2e5e04ff32c27cc24f75a8c497998b2b0b32a0379cac/work)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup type cgroup2 (ro,nosuid,nodev,noexec,relatime)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k,inode64)
/dev/sda3 on /etc/resolv.conf type ext4 (rw,noatime)
/dev/sda3 on /etc/hostname type ext4 (rw,noatime)
/dev/sda3 on /etc/hosts type ext4 (rw,noatime)
devpts on /dev/console type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
proc on /proc/bus type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/fs type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/irq type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime)
tmpfs on /proc/acpi type tmpfs (ro,relatime,inode64)
tmpfs on /proc/kcore type tmpfs (rw,nosuid,size=65536k,mode=755,inode64)
tmpfs on /proc/keys type tmpfs (rw,nosuid,size=65536k,mode=755,inode64)
tmpfs on /proc/latency_stats type tmpfs (rw,nosuid,size=65536k,mode=755,inode64)
tmpfs on /proc/timer_list type tmpfs (rw,nosuid,size=65536k,mode=755,inode64)
tmpfs on /proc/scsi type tmpfs (ro,relatime,inode64)
tmpfs on /sys/firmware type tmpfs (ro,relatime,inode64)
# ls -la /var/lib/docker/overlay2/l/UW2KWGUELOV2IF2YXD4AS6KFS4/usr/lib64/qemu-kvm/
total 300
drwxr-xr-x 2 root root 4096 Sep 5 00:12 .
dr-xr-xr-x 29 root root 20480 Dec 31 1969 ..
-rwxr-xr-x 1 root root 11792 Dec 31 1969 accel-qtest-x86_64.so
-rwxr-xr-x 1 root root 24832 Dec 31 1969 accel-tcg-x86_64.so
-rwxr-xr-x 1 root root 7568 Dec 31 1969 hw-display-virtio-gpu-gl.so
-rwxr-xr-x 1 root root 7576 Dec 31 1969 hw-display-virtio-gpu-pci-gl.so
-rwxr-xr-x 1 root root 12688 Dec 31 1969 hw-display-virtio-gpu-pci.so
-rwxr-xr-x 1 root root 53792 Dec 31 1969 hw-display-virtio-gpu.so
-rwxr-xr-x 1 root root 7568 Dec 31 1969 hw-display-virtio-vga-gl.so
-rwxr-xr-x 1 root root 17368 Dec 31 1969 hw-display-virtio-vga.so
-rwxr-xr-x 1 root root 47688 Dec 31 1969 hw-usb-host.so
-rwxr-xr-x 1 root root 67584 Dec 31 1969 hw-usb-redirect.so
Also one more thing to note: podman with btrfs driver sets the permissions correctly
I think this will be a specific problem with Docker and how it handles the layers. One last thing I would check is the layers of our images(It should be 1:1 with what you see with overlay2). I think the next step would check the code in Docker/file bug there(We could also check one more runtime - e.g crio).
Meanwhile, what do you think about applying a workaround in KubeVirt? Pre-creating the directory with proper permissions seems to solve the issue:
diff --git a/cmd/virt-launcher/BUILD.bazel b/cmd/virt-launcher/BUILD.bazel
index d9cc5f252..4190794fb 100644
--- a/cmd/virt-launcher/BUILD.bazel
+++ b/cmd/virt-launcher/BUILD.bazel
@@ -154,6 +154,15 @@ pkg_tar(
package_dir = "/etc",
)
+pkg_tar(
+ name = "qemu-kvm-modules-dir-tar",
+ empty_dirs = [
+ "usr/lib64/qemu-kvm",
+ ],
+ mode = "0755",
+ owner = "0.0",
+)
+
container_image(
name = "version-container",
directory = "/",
@@ -169,6 +178,7 @@ container_image(
":libvirt-config",
":passwd-tar",
":nsswitch-tar",
+ ":qemu-kvm-modules-dir-tar",
"//rpm:launcherbase_x86_64",
],
}),
@vasiliy-ul thank you for fixing it
Well, it's more like a workaround rather than a fix. But hopefully it should handle this specific problem for now. Also, raised a docker issue for that. Let's see if it gets some feedback there.
What happened: I was trying to start kubevirt on my AWS instance.
What you expected to happen: I was expection
virt-handler
to start on my AWS instance.How to reproduce it (as minimally and precisely as possible): Try to start kubevirt on AWS.
Additional context:
virt-launcher
container invirt-handler
pod is failing on detecting emulator capabilities.tcg
libraries are probably missingEnvironment:
KubeVirt version (use
virtctl version
):v0.54.0
Kubernetes version (use
kubectl version
):v1.21.7
VM or VMI specifications:
ami-05e5abbfdd4424640
Cloud provider or hardware configuration: AWS EC2 instance
OS (e.g. from /etc/os-release):
Kernel (e.g.
uname -a
):Linux ip-192-168-1-33.eu-central-1.compute.internal 4.18.0-147.5.1.ves4.el7.x86_64 #1 SMP Mon Mar 16 08:47:16 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Install tools: N/A
Others: N/A
I'll be grateful for any suggestion.