Open velde666 opened 5 years ago
Ever get this solved? I'm having this problem. Can't keep a VM up for more than a couple days at best.
Hi @melyux
I am still struggling with this even though I stabilized my Win VM to > 90% I would say.
This is what I have done:
I also tried newer kernels directly from kernel.org (5.5.13 and 5.6.3) but 5.5.13 f**ked up my wifi and 5.6.3 crashed the Windows vm as before although the above-mentioned parameters were not changed. I am wondering if the changes in gvt-linux are getting integrated in standard kernel.
Additionally I updated CentOS 7 to 8 in-place (waaaaahhhh) and Win 10 to 1909. But I don't think this stabilized the Win VM in any way.
Best regards
Similar issue - Proxmox VE 6.2 host with 5.4.41-1 kernel. Coffee Lake ER Xeon processor with Intel UHD 630 Graphics. Windows GPU driver is newest available. It seems my problems recur faster with the more VMs I allocate. Using 128MB vGPUs for each guest with a total aperture of 512MB and no more than 3 guest VMs at a time in use.
Boot flags include:
kvm.ignore_msrs=1
i915.enable_execlists=0
I get the above page fault errors plus the following
May 20 21:55:01 virt-slc-11 kernel: [29241.213006] Call Trace:
May 20 21:55:01 virt-slc-11 kernel: [29241.213007] __schedule+0x2e6/0x6f0
May 20 21:55:01 virt-slc-11 kernel: [29241.213009] schedule+0x33/0xa0
May 20 21:55:01 virt-slc-11 kernel: [29241.213010] schedule_preempt_disabled+0xe/0x10
May 20 21:55:01 virt-slc-11 kernel: [29241.213011] __mutex_lock.isra.10+0x2c9/0x4c0
May 20 21:55:01 virt-slc-11 kernel: [29241.213026] ? kvm_arch_vcpu_put+0xe2/0x170 [kvm]
May 20 21:55:01 virt-slc-11 kernel: [29241.213028] __mutex_lock_slowpath+0x13/0x20
May 20 21:55:01 virt-slc-11 kernel: [29241.213029] mutex_lock+0x2c/0x30
May 20 21:55:01 virt-slc-11 kernel: [29241.213049] intel_vgpu_emulate_mmio_write+0x68/0x220 [i915]
May 20 21:55:01 virt-slc-11 kernel: [29241.213050] intel_vgpu_rw+0xb3/0x1f0 [kvmgt]
May 20 21:55:01 virt-slc-11 kernel: [29241.213052] intel_vgpu_write+0x16e/0x200 [kvmgt]
May 20 21:55:01 virt-slc-11 kernel: [29241.213053] vfio_mdev_write+0x22/0x30 [vfio_mdev]
May 20 21:55:01 virt-slc-11 kernel: [29241.213054] vfio_device_fops_write+0x26/0x30 [vfio]
May 20 21:55:01 virt-slc-11 kernel: [29241.213055] __vfs_write+0x1b/0x40
May 20 21:55:01 virt-slc-11 kernel: [29241.213056] vfs_write+0xab/0x1b0
May 20 21:55:01 virt-slc-11 kernel: [29241.213057] ksys_pwrite64+0x66/0xa0
May 20 21:55:01 virt-slc-11 kernel: [29241.213058] __x64_sys_pwrite64+0x1e/0x20
May 20 21:55:01 virt-slc-11 kernel: [29241.213059] do_syscall_64+0x57/0x190
May 20 21:55:01 virt-slc-11 kernel: [29241.213060] entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 20 21:55:01 virt-slc-11 kernel: [29241.213061] RIP: 0033:0x7f7cb6674edf
May 20 21:55:01 virt-slc-11 kernel: [29241.213062] Code: Bad RIP value.
May 20 21:55:01 virt-slc-11 kernel: [29241.213062] RSP: 002b:00007f7aa7ff97a0 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
May 20 21:55:01 virt-slc-11 kernel: [29241.213063] RAX: ffffffffffffffda RBX: 0000000000000023 RCX: 00007f7cb6674edf
May 20 21:55:01 virt-slc-11 kernel: [29241.213064] RDX: 0000000000000004 RSI: 00007f7aa7ff97f8 RDI: 0000000000000023
May 20 21:55:01 virt-slc-11 kernel: [29241.213064] RBP: 00007f7aa7ff97f8 R08: 0000000000000000 R09: 00000000ffffffff
May 20 21:55:01 virt-slc-11 kernel: [29241.213065] R10: 000000000000a278 R11: 0000000000000293 R12: 0000000000000004
May 20 21:55:01 virt-slc-11 kernel: [29241.213065] R13: 000000000000a278 R14: 00007f7aa42817c0 R15: 00007f7aa42816f0
I found additional debug information but I have a feeling it may be a PPGTT issue with newer processors. I'm covering this in #153
I am having same issue and I am using PVE 6.3-3
[ 2741.241926] gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xfffff80520c07ecf type 9 [ 2741.242392] gvt: vgpu 1: fail: spt 00000000d7d4221d guest entry 0xfffff80520c07ecf type 9 [ 2741.242855] gvt: vgpu 1: fail: shadow page 00000000d7d4221d guest entry 0xfffff80520c07ecf type 9. [ 2741.243324] gvt: guest page write error, gpa 1513e9c78
and I am seeing similiar trace mentions by reedog117
Same situation here:
gvt: guest page write error, gpa 11795afb8 kernel: gvt: guest page write error, gpa 11795afb8 kernel: gvt: guest page write error, gpa 11795aff8 kernel: guest page write error, gpa 11795aff8 kernel: gvt: guest page write error, gpa 11795aff8 kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0x6d006100720067 type 8 kernel: gvt: vgpu 1: fail: shadow page 000000004db426e4 guest entry 0x6d006100720067 type 8 kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xcc0c81 type 9 kernel: gvt: vgpu 1: fail: spt 00000000f6915228 guest entry 0xcc0c81 type 9 kerrnel: gvt: vgpu 1: fail: shadow page 00000000f6915228 guest entry 0xcc0c81 type 9. kernel: gvt: vgpu 1: fail to flush post shadow kernel: gvt: vgpu 1: fail to dispatch workload, skip kernel: gvt: vgpu(1) Invalid FORCE_NONPRIV write 2341 at offset 24d8 kernel: gvt: vgpu(1) Invalid FORCE_NONPRIV write 2351 at offset 24dc gvt: vgpu(1) Invalid FORCE_NONPRIV write 10000d82 at offset 24e0 kernel: gvt: vgpu(1) Invalid FORCE_NONPRIV write 10064844 at offset 24e4 kernel: gvt: vgpu(1) Invalid FORCE_NONPRIV write 4000b118 at offset 24f0
VM Just freezes some minutes after boot on Win 10.
This is still an issue even with the most recent 5.10 kernel. I can see this issue on all Intel CPUs with IGP from Gen6 to Gen10th.
I'm still seeing this on 5.12.13 kernel. The guest continues to run, but the display is frozen
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b804090
июн 25 22:57:03 PF16W6Y2 kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xffffffffffffffff type 9
июн 25 22:57:03 PF16W6Y2 kernel: gvt: vgpu 1: fail: spt 00000000680cc782 guest entry 0xffffffffffffffff type 9
июн 25 22:57:03 PF16W6Y2 kernel: gvt: vgpu 1: fail: shadow page 00000000680cc782 guest entry 0xffffffffffffffff type 9.
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b804090
июн 25 22:57:03 PF16W6Y2 kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xffffffffffffffff type 9
июн 25 22:57:03 PF16W6Y2 kernel: gvt: vgpu 1: fail: spt 00000000680cc782 guest entry 0xffffffffffffffff type 9
июн 25 22:57:03 PF16W6Y2 kernel: gvt: vgpu 1: fail: shadow page 00000000680cc782 guest entry 0xffffffffffffffff type 9.
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b804090
июн 25 22:57:03 PF16W6Y2 kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xffffffffffffffff type 9
июн 25 22:57:03 PF16W6Y2 kernel: gvt: vgpu 1: fail: spt 00000000680cc782 guest entry 0xffffffffffffffff type 9
июн 25 22:57:03 PF16W6Y2 kernel: gvt: vgpu 1: fail: shadow page 00000000680cc782 guest entry 0xffffffffffffffff type 9.
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b804090
июн 25 22:57:03 PF16W6Y2 kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xffffffffffffffff type 9
июн 25 22:57:03 PF16W6Y2 kernel: gvt: vgpu 1: fail: spt 00000000680cc782 guest entry 0xffffffffffffffff type 9
июн 25 22:57:03 PF16W6Y2 kernel: gvt: vgpu 1: fail: shadow page 00000000680cc782 guest entry 0xffffffffffffffff type 9.
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b804090
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b804000
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b804010
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b804020
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b804030
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b804040
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b804050
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b804060
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b804070
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b804080
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b804090
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b8040a0
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b8040b0
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b8040c0
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b8040d0
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b8040e0
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b8040f0
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b804100
июн 25 22:57:03 PF16W6Y2 kernel: gvt: guest page write error, gpa 2b804108
июн 25 22:57:03 PF16W6Y2 kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xf0f0f0f0f0f0f0f type 9
июн 25 22:57:03 PF16W6Y2 kernel: gvt: vgpu 1: fail: spt 00000000680cc782 guest entry 0xf0f0f0f0f0f0f0f type 9
You can try the driver 30.0.100.9684 (https://downloadcenter.intel.com/download/30579/Intel-Graphics-Windows-DCH-Drivers) and try again. From our side it is stable with kernel 5.11 (5.12 has a regression #188 and the bug fix patch hasn't been upstream until now).
Hi. Please test my working solution: https://github.com/intel/gvt-linux/issues/188#issuecomment-955584215
Hi there,
I am using kvmgt on a Intel NUC (Intel HD Graphics 620) running CentOS 7 since several months with several kernels (CentOS 7 standard 3.x, self-compiled 5.x and now from this project 5.4.0-rc7-01779-g74c926f-dirty) and always have the issue that the Windows 10 vm keeps crashing more or less often. Sometimes there is more than a week between crashes, sometimes just hours or minutes.
Win10 is v1903 Win GPU driver is 26.20.100.7000 (everything newer does not work) CentOS is v7.7.1908 kvm-qemu-ev is 2.12.0-33.1 firmware files for i915 are fresh from yesterday (https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915)
When the vm crashes I got gazillions of messages like those:
Nov 18 15:54:51 floor13 kernel: gvt: vgpu 1: fail: shadow page 0000000049e48f88 guest entry 0x6735b2906735b29 type 9 Nov 18 15:54:51 floor13 kernel: gvt: vgpu 1: fail: spt 00000000c918a2ce guest entry 0x6735b2906735b29 type 9 Nov 18 15:54:51 floor13 kernel: gvt: vgpu 1: fail: shadow page 00000000c918a2ce guest entry 0x6735b2906735b29 type 9. Nov 18 15:54:51 floor13 kernel: gvt: guest page write error, gpa 1a2295000 Nov 18 15:54:51 floor13 kernel: gvt: vgpu 1: fail: shadow page 0000000049e48f88 guest entry 0x6735b2906735b29 type 9 Nov 18 15:54:51 floor13 kernel: gvt: vgpu 1: fail: spt 00000000c918a2ce guest entry 0x6735b2906735b29 type 9 Nov 18 15:54:51 floor13 kernel: gvt: vgpu 1: fail: shadow page 00000000c918a2ce guest entry 0x6735b2906735b29 type 9. Nov 18 15:54:51 floor13 kernel: gvt: guest page write error, gpa 1a2295008 Nov 18 15:54:51 floor13 kernel: gvt: vgpu 1: fail: shadow page 0000000049e48f88 guest entry 0x6735b2906735b29 type 9 Nov 18 15:54:51 floor13 kernel: gvt: vgpu 1: fail: spt 00000000c918a2ce guest entry 0x6735b2906735b29 type 9 Nov 18 15:54:51 floor13 kernel: gvt: vgpu 1: fail: shadow page 00000000c918a2ce guest entry 0x6735b2906735b29 type 9. Nov 18 15:54:51 floor13 kernel: gvt: guest page write error, gpa 1a2295010
ending with
Nov 18 15:54:52 floor13 kernel: gvt: vgpu 1: fail to flush post shadow Nov 18 15:54:52 floor13 kernel: gvt: vgpu 1: fail to dispatch workload, skip
After that I see kernel traces starting with
Nov 18 15:57:23 floor13 kernel: INFO: task gvt_service_thr:289 blocked for more than 122 seconds.
There is no dump generated in /sys/class/drm/card0/error:
[root@floor13 ~]# cat /sys/class/drm/card0/error No error state collected
I will attach all messages to this issue and appreciate any help on this :)
Best regards messages.txt