intel / gvt-linux

Other
509 stars 95 forks source link

Win10 guest crashes as soon as the mouse enters the window after the display is initialized #161

Open rugubara opened 4 years ago

rugubara commented 4 years ago

tested with Intel(R) Core(TM) i7-8850H on kernels 5.6.19 thru 5.7.8 and QEMU 5.0.0. Repeatable on all GVT windows VMs I have.

The log contains: июл 13 23:21:58 PF16W6Y2 kernel: llvmpipe-5[517415]: segfault at 100 ip 00007fed6c0049e0 sp 00007fed61141480 error 4 июл 13 23:21:58 PF16W6Y2 kernel: llvmpipe-9[517419]: segfault at 200 ip 00007fed6c0049e0 sp 00007fed52ffc480 error 4 июл 13 23:21:58 PF16W6Y2 kernel: llvmpipe-10[517420]: segfault at 300 ip 00007fed6c0049e0 sp 00007fed527fb480 error 4 июл 13 23:21:58 PF16W6Y2 kernel: Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 42 2d 90 0c 32 c5 c9 ef f6 c4 4> июл 13 23:21:58 PF16W6Y2 kernel: Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 42 2d 90 0c 32 c5 c9 ef f6 c4 4> июл 13 23:21:58 PF16W6Y2 kernel: Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 42 2d 90 0c 32 c5 c9 ef f6 c4 4> июл 13 23:21:58 PF16W6Y2 systemd[1]: Started Process Core Dump (PID 525631/UID 0). июл 13 23:21:58 PF16W6Y2 systemd-coredump[525632]: Resource limits disable core dumping for process 517347 (qemu-system-x86). июл 13 23:21:58 PF16W6Y2 systemd-coredump[525632]: Process 517347 (qemu-system-x86) of user 77 dumped core. июл 13 23:21:58 PF16W6Y2 libvirtd[3239]: internal error: End of file from qemu monitor

rugubara commented 4 years ago

Linux guests work without issues (with workaround from #112 )

grazzolini commented 4 years ago

This also happens with kernel 5.7.8. I'm using that kernel both on host and guest, and, when starting X, the virtual machine will simply segfault and the qemu process exits.

cristatus commented 4 years ago

I am having the same issue on Arch Linux (all updates applied) when using -display spice-app,gl=on. Works fine with -display gtk,gl=on. So it looks like more the issue with spice or virt-viewer.

rugubara commented 4 years ago

I am having the same issue on Arch Linux (all updates applied) when using -display spice-app,gl=on. Works fine with -display gtk,gl=on. So it looks like more the issue with spice or virt-viewer.

I tried this approach with my guest. QEMU dumps core immediately with -display gtk,gl=on for me.

grazzolini commented 4 years ago

Can any of you also check if you are using a NVIDIA card in conjunction with the intel one with prime render offload, see #162. I was also getting this segfault with that enabled, but after making sure that the nvidia card is not loaded at all, I was able to run a linux guest on the latest kernel.

cristatus commented 4 years ago

Can any of you also check if you are using a NVIDIA card in conjunction with the intel one with prime render offload, see #162. I was also getting this segfault with that enabled, but after making sure that the nvidia card is not loaded at all, I was able to run a linux guest on the latest kernel.

Yes, I can confirm. The issue seems to be caused by NVIDIA card in conjunction with the intel one with prime render offload. However, I am getting garbled graphics even when setting MESA_LOADER_DRIVER_OVERRIDE=i965 envidornment.

After, reboot the graphics was fine.

cristatus commented 4 years ago

So I wonder why the nvidia driver is causing this issue. Actually, I am running wayland session on Intel graphics and using nvidia card only for non-graphics operations (ML, Hardware encoding etc).

grazzolini commented 4 years ago

I think the issue is due to the fact that Xorg on the host machine is also running on the nvidia card (for the prime render offload) and, it seems the intel card and the nvidia card do clash on some memory region. I would look at mesa for the culprit. Also, @cristatus, if I run a VM, stop it, suspend the host, and then try to run the VM again afterwards, it doesn't run on kernel 5.7. I can run the VM only once per host boot. On the -lts kernel it's running fine.

rugubara commented 4 years ago

I confirm that I have my X server configured to use nvidia card for prime render offload. I have blacklisted nvidia kernel modules and I was able to successfully run my GVT VM. I was also able to get the UEFI output from the TianoCore (pre-boot animation). I wasn't able to get it when nvidia drivers were loaded. Is this the correct place to discuss this issue? Should we raise it somewhere else as well?

grazzolini commented 4 years ago

@rugubara, same here. After disabling prime render offload I was able to see the bios output. I'm not sure where we should report this, other than here. But, as I mentioned above, I think there is a bad interaction with using gvt-g and mesa with prime render offload. There are no issues on the host itself.

cristatus commented 4 years ago

Also, @cristatus, if I run a VM, stop it, suspend the host, and then try to run the VM again afterwards, it doesn't run on kernel 5.7. I can run the VM only once per host boot. On the -lts kernel it's running fine.

Hi @grazzolini. In my case, suspend doesn't work at all if vGPU is created (system freeze). I have to remove the vGPU before suspend. I can re-create the vGPU after system wake up and run VM successfully. It's on kernel 5.7 only.

grazzolini commented 4 years ago

@cristatus, I get the same behavior if the vGPU is created and the VM running. But, if I run the VM, stop it (not suspend the VM, completely poweroff the VM), suspend the host, resume from suspend, and try to run the VM again, it freezes after the bios. I never got suspend working while the VM is running. Always poweroff the VM, then suspend the host.

cristatus commented 4 years ago

Besides the -display gtk,gl=on, I managed to run with -display egl-headless,rendernode=/dev/dri/by-path/pci-0000:00:02.0-render and using remote-viewer to see the desktop. This enabled all spice features.

So, I think the issue is somewhere with spice-app.

rugubara commented 4 years ago

I have the following spice apps/libs installed ATM:

[I] app-emulation/spice
     Installed versions:  0.14.3(10:42:54 14.07.2020)(gstreamer -libressl -lz4 -sasl -smartcard -static-libs)
[I] app-emulation/spice-protocol
     Installed versions:  0.14.1(10:41:52 14.07.2020)
[I] net-misc/spice-gtk
     Installed versions:  0.38(10:43:25 14.07.2020)(gtk3 introspection policykit pulseaudio usbredir vala webdav -libressl -lz4 -mjpeg -sasl -smartcard)
anton@PF16W6Y2 ~ $ uname -a
Linux PF16W6Y2 5.7.16-gentoo #1 SMP PREEMPT Fri Aug 21 11:07:54 MSK 2020 x86_64 Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz GenuineIntel GNU/Linux
anton@PF16W6Y2 ~ $ lsmod|grep nvidia
nvidia_drm             49152  5
nvidia_modeset       1138688  5 nvidia_drm
nvidia              19288064  150 nvidia_modeset

and I have a working VM. Nothing crashes when I enter the mouse to the window. I don't have a mouse pointer in the window though, but that's another bug. And I still don't have the UEFI boot process visibility while nvidia kernel driver is loaded.

cristatus commented 4 years ago

I am on Arch Linux with latest updates:

What works:

What doesn't work:

With spirce-app, qemu crashes with following error:

[  510.529606] [drm:nv_drm_gem_fence_attach_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to lookup gem object for fence attach: 0x00000004
[  510.600952] L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.
[  526.357039] llvmpipe-1[3840]: segfault at 0 ip 00007f2d269eb9e0 sp 00007f2d186e42c0 error 4
[  526.357040] llvmpipe-0[3839]: segfault at 100 ip 00007f2d269eb9e0 sp 00007f2d18ee52c0 error 4
[  526.357042] llvmpipe-2[3841]: segfault at 200 ip 00007f2d269eb9e0 sp 00007f2d17ee32c0 error 4
[  526.357044] llvmpipe-3[3842]: segfault at 300 ip 00007f2d269eb9e0 sp 00007f2d176e22c0 error 4
[  526.357046] llvmpipe-4[3843]: segfault at 400 ip 00007f2d269eb9e0 sp 00007f2d16ee12c0 error 4
[  526.357048] Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 <c4> 42 2d 90 0c 32 c5 c9 ef f6 c4 41 2d 76 d2 c4 82 2d 90 34 02 c5
[  526.357050] llvmpipe-11[3850]: segfault at 500 ip 00007f2d269eb9e0 sp 00007f2cf6ffc2c0 error 4
[  526.357052] Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 <c4> 42 2d 90 0c 32 c5 c9 ef f6 c4 41 2d 76 d2 c4 82 2d 90 34 02 c5
[  526.357053] Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 <c4> 42 2d 90 0c 32 c5 c9 ef f6 c4 41 2d 76 d2 c4 82 2d 90 34 02 c5
[  526.357055] Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 <c4> 42 2d 90 0c 32 c5 c9 ef f6 c4 41 2d 76 d2 c4 82 2d 90 34 02 c5
[  526.357056] Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 <c4> 42 2d 90 0c 32 c5 c9 ef f6 c4 41 2d 76 d2 c4 82 2d 90 34 02 c5
[  526.357057] Code: fd fe c2 c5 ed 72 e0 08 c4 e3 6d 4a d7 00 c4 e2 6d 39 df c4 c2 65 40 df c5 e5 fe f1 c5 55 fe c3 c4 41 31 ef c9 c4 41 2d 76 d2 <c4> 42 2d 90 0c 32 c5 c9 ef f6 c4 41 2d 76 d2 c4 82 2d 90 34 02 c5

What's more strange is, when using egl-headless, the remote-viewer (which uses spice) works fine.

coxuintel commented 4 years ago

Have you tried to re-compile QEMU with latest spice stack? Most segfaults from app result from a mis-matched library.

cristatus commented 4 years ago

On Arch Linux, most of the time QEMU and spice are the latest releases only.

My current QEMU, spice and virt-viewer versions are:

coxuintel commented 4 years ago

Yes Arch is rolling release distro. Btw, if you keep all QEMU parameters but replace gvt-g to -vga qxl, will QEMU still crash?

cristatus commented 3 years ago

Yes Arch is rolling release distro. Btw, if you keep all QEMU parameters but replace gvt-g to -vga qxl, will QEMU still crash?

No. Issue happens with gvt-g only and that too with the virt-viewer (-display spice-app,gl=on). Works fine with -display gtk,gl=on.

Strange thing is that, if I use egl-headless with spice (-display egl-headless,rendernode=/dev/dri/by-path/pci-0000:00:02.0-render), virt-viewer is working fine (but performance is poor).

BTW, I just tried compiling QEMU from git master and I am seeing the same issue.