Closed kvaps closed 1 year ago
Hi @kvaps, thanks for opening this issue
I've been doing some research and this seems to be a somewhat common issue in some virtualization platforms with guest OS with high CPU requirements (usually, CPUs compatible with x86-64-v2).
Quoting a case with the same error in a RHEL9 guest:
The Virtualization Platform is likely hiding CPU features to enable live migration compatibility between different hypervisor machines with different CPU models, it needs to be raised to a newer minimum set of features that is compatible with x86-64-v2.
The Virtualization platforms often allow configuring a minimum denominator CPU model, which is a subset or equal to all physical hosts actual CPUs. This is in order to hide features from the VMs so that they can live migrate between hosts with different CPU models, so that the VM does not suddenly misses CPU features when migrating. This setting is likely set to a value too low, below the minimum requirements of RHEL9.
I've seen several other people having this issue with kvm6. The problem seems to be solved when changing the CPU to match the host:
https://forum.proxmox.com/threads/kernel-panic-when-creating-vms-centos-9-stream-iso.104656/ https://github.com/ansible/awx/issues/11879 https://forums.centos.org/viewtopic.php?t=78733
Knowing this, I would not consider this to be a bug in CDI, and can probably be solved by tweaking that specific configuration in proxmox.
Hope this helps! Let's see if some other people with more expertise in this topic can help too.
This is not only proxmox problem, I have the same problem with OpenStack VMs.
The CDI does not work in virtual machines if cpu is not set to host-cpu
.
The problem is common among several virtualization platforms, including OpenStack. Supposedly, to improve live migration between hosts with different CPU models.
In the case of OpenStack Virtualization, general CPU options are set in the [libvirt] group in /etc/nova/nova.conf
. It seems that you can find a workaround by tweaking cpu_mode
, cpu_model
and cpu_model_extra_flags
(documentation), but I've yet to find the exact values required. Setting cpu_mode to host-model seems to be the way to go in most cases.
Also, from qemu's documentation: https://www.qemu.org/docs/master/system/i386/cpu.html#other-non-recommended-x86-cpus
It seems it is the common problem of rhel9 https://access.redhat.com/solutions/6833751
Right... it seems to be a problem in all OS with mandatory x86-64-v2 support. Since all the workarounds suggest changing the CPU to host, it simply looks like most virtualization platforms won't support kvm64 in OS with specific CPU requirements. That said, I don't think we can do much there.
Why other KubeVirt workloads work then? Afaik they are also based on centos stream9. What is diferent? -static build? -glibc?
Interesting, I'll do some research about that. Can you specify the workloads and their versions?
Sure, here is my testing workflow:
create the vm:
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
name: testvmi-nocloud
spec:
terminationGracePeriodSeconds: 30
domain:
cpu:
model: kvm64
resources:
requests:
memory: 1024M
devices:
disks:
- name: containerdisk
disk:
bus: virtio
- disk:
bus: virtio
name: cloudinitdisk
volumes:
- name: containerdisk
containerDisk:
image: kubevirt/fedora-cloud-container-disk-demo:latest
- name: cloudinitdisk
cloudInitNoCloud:
userData: |-
#cloud-config
password: fedora
chpasswd: { expire: False }
Enable kvm64
processor:
kubectl label node --all cpu-model.node.kubevirt.io/kvm64=true
login to the vm:
virtctl console testvmi-nocloud
Install docker
Try to run cdi-operator and virt-opeartor with latest versions:
# docker run -ti --rm quay.io/kubevirt/cdi-operator:v1.56.0
Fatal glibc error: CPU does not support x86-64-v2
# docker run -ti --rm quay.io/kubevirt/virt-operator:v0.59.0
{"component":"virt-operator","level":"info","msg":"cannot find virt-operator's image\n","pos":"application.go:148","timestamp":"2023-04-10T11:55:12.530095Z"}
However virt-handler does not work as well:
# docker run -ti --rm quay.io/kubevirt/virt-handler:v0.59.0
Fatal glibc error: CPU does not support x86-64-v2
@kvaps
Yeah looks like kubevirt is setting static = "on"
like here:
https://github.com/kubevirt/kubevirt/blob/main/cmd/virt-operator/BUILD.bazel#L19
We can maybe do that as well for some containers
I think this tails all the way down to QEMU (where it got fixed and backported): https://github.com/containers/podman/issues/15456#issuecomment-1398083220 https://gitlab.com/qemu-project/qemu/-/commit/d135f781405f7c78153aa65e0327b05a4aa72e50
Of course ignore this if qemu in your environment already contains this
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale
I think this tails all the way down to QEMU (where it got fixed and backported): containers/podman#15456 (comment) https://gitlab.com/qemu-project/qemu/-/commit/d135f781405f7c78153aa65e0327b05a4aa72e50
Of course ignore this if qemu in your environment already contains this
@kvaps did this solve the issue? Did you check if your qemu contains this fix?
I think it's safe to close the issue since the fix has been addressed in the qemu layer (https://github.com/kubevirt/containerized-data-importer/issues/2652#issuecomment-1502426833). Feel free to reopen if necessary, thanks!
What happened: CDI not working on
Common KVM processor
What you expected to happen: CDI able to run in any environment
How to reproduce it (as minimally and precisely as possible):
kvm64
processor. This is default mode in many hypervisors, eg. proxmox:Additional context: Add any other context about the problem here.
Environment:
kubectl get deployments cdi-deployment -o yaml
):1.55.2
kubectl version
):v1.23.16
Ubuntu 22.04.1 LTS
uname -a
):5.15.0-67-generic