Open rugubara opened 4 years ago
Linux guests work without issues (with workaround from #112 )
This also happens with kernel 5.7.8. I'm using that kernel both on host and guest, and, when starting X, the virtual machine will simply segfault and the qemu process exits.
I am having the same issue on Arch Linux (all updates applied) when using -display spice-app,gl=on
. Works fine with -display gtk,gl=on
. So it looks like more the issue with spice or virt-viewer.
I am having the same issue on Arch Linux (all updates applied) when using
-display spice-app,gl=on
. Works fine with-display gtk,gl=on
. So it looks like more the issue with spice or virt-viewer.
I tried this approach with my guest. QEMU dumps core immediately with -display gtk,gl=on for me.
Can any of you also check if you are using a NVIDIA card in conjunction with the intel one with prime render offload, see #162. I was also getting this segfault with that enabled, but after making sure that the nvidia card is not loaded at all, I was able to run a linux guest on the latest kernel.
Can any of you also check if you are using a NVIDIA card in conjunction with the intel one with prime render offload, see #162. I was also getting this segfault with that enabled, but after making sure that the nvidia card is not loaded at all, I was able to run a linux guest on the latest kernel.
Yes, I can confirm. The issue seems to be caused by NVIDIA card in conjunction with the intel one with prime render offload. However, I am getting garbled graphics even when setting .MESA_LOADER_DRIVER_OVERRIDE=i965
envidornment
After, reboot the graphics was fine.
So I wonder why the nvidia driver is causing this issue. Actually, I am running wayland session on Intel graphics and using nvidia card only for non-graphics operations (ML, Hardware encoding etc).
I think the issue is due to the fact that Xorg on the host machine is also running on the nvidia card (for the prime render offload) and, it seems the intel card and the nvidia card do clash on some memory region. I would look at mesa for the culprit. Also, @cristatus, if I run a VM, stop it, suspend the host, and then try to run the VM again afterwards, it doesn't run on kernel 5.7. I can run the VM only once per host boot. On the -lts kernel it's running fine.
I confirm that I have my X server configured to use nvidia card for prime render offload. I have blacklisted nvidia kernel modules and I was able to successfully run my GVT VM. I was also able to get the UEFI output from the TianoCore (pre-boot animation). I wasn't able to get it when nvidia drivers were loaded. Is this the correct place to discuss this issue? Should we raise it somewhere else as well?
@rugubara, same here. After disabling prime render offload I was able to see the bios output. I'm not sure where we should report this, other than here. But, as I mentioned above, I think there is a bad interaction with using gvt-g and mesa with prime render offload. There are no issues on the host itself.
Also, @cristatus, if I run a VM, stop it, suspend the host, and then try to run the VM again afterwards, it doesn't run on kernel 5.7. I can run the VM only once per host boot. On the -lts kernel it's running fine.
Hi @grazzolini. In my case, suspend doesn't work at all if vGPU is created (system freeze). I have to remove the vGPU before suspend. I can re-create the vGPU after system wake up and run VM successfully. It's on kernel 5.7 only.
@cristatus, I get the same behavior if the vGPU is created and the VM running. But, if I run the VM, stop it (not suspend the VM, completely poweroff the VM), suspend the host, resume from suspend, and try to run the VM again, it freezes after the bios. I never got suspend working while the VM is running. Always poweroff the VM, then suspend the host.
Besides the -display gtk,gl=on
, I managed to run with -display egl-headless,rendernode=/dev/dri/by-path/pci-0000:00:02.0-render
and using remote-viewer
to see the desktop. This enabled all spice features.
So, I think the issue is somewhere with spice-app
.
I have the following spice apps/libs installed ATM:
[I] app-emulation/spice
Installed versions: 0.14.3(10:42:54 14.07.2020)(gstreamer -libressl -lz4 -sasl -smartcard -static-libs)
[I] app-emulation/spice-protocol
Installed versions: 0.14.1(10:41:52 14.07.2020)
[I] net-misc/spice-gtk
Installed versions: 0.38(10:43:25 14.07.2020)(gtk3 introspection policykit pulseaudio usbredir vala webdav -libressl -lz4 -mjpeg -sasl -smartcard)
anton@PF16W6Y2 ~ $ uname -a
Linux PF16W6Y2 5.7.16-gentoo #1 SMP PREEMPT Fri Aug 21 11:07:54 MSK 2020 x86_64 Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz GenuineIntel GNU/Linux
anton@PF16W6Y2 ~ $ lsmod|grep nvidia
nvidia_drm 49152 5
nvidia_modeset 1138688 5 nvidia_drm
nvidia 19288064 150 nvidia_modeset
and I have a working VM. Nothing crashes when I enter the mouse to the window. I don't have a mouse pointer in the window though, but that's another bug. And I still don't have the UEFI boot process visibility while nvidia kernel driver is loaded.
I am on Arch Linux with latest updates:
What works:
-display gtk,gl=on
-display egl-headless,rendernode=/dev/dri/by-path/pci-0000:00:02.0-render
What doesn't work:
-display spice-app,gl=on
With spirce-app
, qemu crashes with following error:
[ 510.529606] [drm:nv_drm_gem_fence_attach_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to lookup gem object for fence attach: 0x00000004
[ 510.600952] L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.
[ 526.357039] llvmpipe-1[3840]: segfault at 0 ip 00007f2d269eb9e0 sp 00007f2d186e42c0 error 4
[ 526.357040] llvmpipe-0[3839]: segfault at 100 ip 00007f2d269eb9e0 sp 00007f2d18ee52c0 error 4
[ 526.357042] llvmpipe-2[3841]: segfault at 200 ip 00007f2d269eb9e0 sp 00007f2d17ee32c0 error 4
[ 526.357044] llvmpipe-3[3842]: segfault at 300 ip 00007f2d269eb9e0 sp 00007f2d176e22c0 error 4
[ 526.357046] llvmpipe-4[3843]: segfault at 400 ip 00007f2d269eb9e0 sp 00007f2d16ee12c0 error 4
[ 526.357048] Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 <c4> 42 2d 90 0c 32 c5 c9 ef f6 c4 41 2d 76 d2 c4 82 2d 90 34 02 c5
[ 526.357050] llvmpipe-11[3850]: segfault at 500 ip 00007f2d269eb9e0 sp 00007f2cf6ffc2c0 error 4
[ 526.357052] Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 <c4> 42 2d 90 0c 32 c5 c9 ef f6 c4 41 2d 76 d2 c4 82 2d 90 34 02 c5
[ 526.357053] Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 <c4> 42 2d 90 0c 32 c5 c9 ef f6 c4 41 2d 76 d2 c4 82 2d 90 34 02 c5
[ 526.357055] Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 <c4> 42 2d 90 0c 32 c5 c9 ef f6 c4 41 2d 76 d2 c4 82 2d 90 34 02 c5
[ 526.357056] Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 <c4> 42 2d 90 0c 32 c5 c9 ef f6 c4 41 2d 76 d2 c4 82 2d 90 34 02 c5
[ 526.357057] Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 <c4> 42 2d 90 0c 32 c5 c9 ef f6 c4 41 2d 76 d2 c4 82 2d 90 34 02 c5
What's more strange is, when using egl-headless
, the remote-viewer (which uses spice) works fine.
Have you tried to re-compile QEMU with latest spice stack? Most segfaults from app result from a mis-matched library.
On Arch Linux, most of the time QEMU and spice are the latest releases only.
My current QEMU, spice and virt-viewer versions are:
virt-viewer 8.0-2
I will try compiling from git sources.
Yes Arch is rolling release distro. Btw, if you keep all QEMU parameters but replace gvt-g to -vga qxl, will QEMU still crash?
Yes Arch is rolling release distro. Btw, if you keep all QEMU parameters but replace gvt-g to -vga qxl, will QEMU still crash?
No. Issue happens with gvt-g only and that too with the virt-viewer (-display spice-app,gl=on
). Works fine with -display gtk,gl=on
.
Strange thing is that, if I use egl-headless with spice (-display egl-headless,rendernode=/dev/dri/by-path/pci-0000:00:02.0-render
), virt-viewer is working fine (but performance is poor).
BTW, I just tried compiling QEMU from git master and I am seeing the same issue.
tested with Intel(R) Core(TM) i7-8850H on kernels 5.6.19 thru 5.7.8 and QEMU 5.0.0. Repeatable on all GVT windows VMs I have.
The log contains: июл 13 23:21:58 PF16W6Y2 kernel: llvmpipe-5[517415]: segfault at 100 ip 00007fed6c0049e0 sp 00007fed61141480 error 4 июл 13 23:21:58 PF16W6Y2 kernel: llvmpipe-9[517419]: segfault at 200 ip 00007fed6c0049e0 sp 00007fed52ffc480 error 4 июл 13 23:21:58 PF16W6Y2 kernel: llvmpipe-10[517420]: segfault at 300 ip 00007fed6c0049e0 sp 00007fed527fb480 error 4 июл 13 23:21:58 PF16W6Y2 kernel: Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 42 2d 90 0c 32 c5 c9 ef f6 c4 4>
июл 13 23:21:58 PF16W6Y2 kernel: Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 42 2d 90 0c 32 c5 c9 ef f6 c4 4>
июл 13 23:21:58 PF16W6Y2 kernel: Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 42 2d 90 0c 32 c5 c9 ef f6 c4 4>
июл 13 23:21:58 PF16W6Y2 systemd[1]: Started Process Core Dump (PID 525631/UID 0).
июл 13 23:21:58 PF16W6Y2 systemd-coredump[525632]: Resource limits disable core dumping for process 517347 (qemu-system-x86).
июл 13 23:21:58 PF16W6Y2 systemd-coredump[525632]: Process 517347 (qemu-system-x86) of user 77 dumped core.
июл 13 23:21:58 PF16W6Y2 libvirtd[3239]: internal error: End of file from qemu monitor