intel / gvt-linux

Other
512 stars 95 forks source link

gvt: guest page write error, gpa #120

Open rubenvb opened 5 years ago

rubenvb commented 5 years ago

I experimented with this feature on my existing Windows VM. Apart from resolution issues (not changing with window size), I had display using the DMABUF method detailed on the Arch Linux wiki.

I could not install the latest GPU driver, so I attempted to use the Intel Driver assistant. Somewhere during installation, my whole system (incl. host) froze, the display was unresponsive. I output the journalctl log for this boot: log.txt

It seems like a kernel issue, so I gave up experimenting. Also the display resolution and mouse offset issues ruined the experience for me, so I'm back to testing my Windows GL apps with Mesa's opengl32.dll for now. If I can give any more information or try some different options, I'll be glad to give more feedback. For the record, my libvirt configuration for this VM Windows-gvt.xml.txt

tinywrkb commented 5 years ago

Which microarchitecture? I also noticed this regression on Broadwell and while I was guessing this is kernel regression I failed to confirm and pinpoint the offending package and release. I think maybe introduced in kernel 5.1.0 so you might want to try latest available 5.0.x release.

I don't have the time to debug this further and I personally gave up on GVT-g. I'l get a discrete card and RDP in to a remote VM when I need 3D acceleration.

If 5.0.x working for you and you decide to bisect 5.0.0..5.1.0 then you need to be aware of the data corruption bug in 5.1.0.

prodrigestivill commented 4 years ago

Same problem with, Coffee Lake UHD Graphics 630. It crashes trying to change resolution and even sometimes without any resolution change at all.

Kernel: 5.3.13-arch1-1

[ 8780.483889] gvt: guest page write error, gpa 48f7bf90
[ 8780.483892] gvt: guest page write error, gpa 48f7bfa0
[ 8780.483895] gvt: guest page write error, gpa 48f7bfb0
[ 8780.483900] gvt: guest page write error, gpa 48f7bfc0
[ 8780.483903] gvt: guest page write error, gpa 48f7bfd0
[ 8780.483906] gvt: guest page write error, gpa 48f7bfe0
[ 8780.483909] gvt: guest page write error, gpa 48f7bff0
[ 8783.987131] gvt: vgpu 1: fail: shadow page 00000000d9ca873f guest entry 0x101a3be1e570f05 type 9
[ 8783.987136] gvt: vgpu 1: fail: spt 0000000056673cae guest entry 0x101a3be1e570f05 type 9
[ 8783.987137] gvt: vgpu 1: fail: shadow page 0000000056673cae guest entry 0x101a3be1e570f05 type 9.
[ 8783.987138] gvt: vgpu 1: fail to flush post shadow
[ 8783.987139] gvt: vgpu 1: fail to dispatch workload, skip
rugubara commented 4 years ago

+1 on Coffee-lake Mobile on Gentoo 5.5.7-gentoo #1 SMP PREEMPT Sat Feb 29 17:34:20 MSK 2020 x86_64 Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz GenuineIntel GNU/Linux

melyux commented 4 years ago

And another +1 on Coffee Lake using kernels 5.2.0 and 5.6.4 on Debian. Except it's not sudden. It builds up over a couple of hours, with thousands of the gvt: guest page write error, gpa 1ba99cfe0 in the syslog, a few hundred at a time with a couple minute's break in between, and then bam,

gvt: guest page write error, gpa 1ba99cfe0
gvt: guest page write error, gpa 1ba99cff0
gvt: vgpu 1: fail: shadow page 00000000ad8a12d6 guest entry 0xff8b8b8bff858585 type 9
gvt: vgpu 1: fail: spt 00000000c64bab8c guest entry 0xff8b8b8bff858585 type 9
gvt: vgpu 1: fail: shadow page 00000000c64bab8c guest entry 0xff8b8b8bff858585 type 9.
gvt: vgpu 1: fail to flush post shadow
gvt: vgpu 1: fail to dispatch workload, skip

and it's over. No resolution changes happening, just a load of video decoding.

reedog117 commented 4 years ago

Same issue on Proxmox VE 6.1 - Kernel 5.3.18-3. Coffee Lake ER Xeon processor.

reedog117 commented 4 years ago

Retried with kernel 5.4.27-1-pve #1 SMP PVE 5.4.27-1 - Coffee Lake ER Xeon processor. Similar issues still persist. This seems to build up over days although I can't find a pattern.

 gvt: vgpu 2: fail: shadow page 000000004594f23f guest entry 0x8f9b7e998f9b7e99 type 9
 gvt: vgpu 2: fail: spt 00000000e6317088 guest entry 0x8f9b7e998f9b7e99 type 9
 gvt: vgpu 2: fail: shadow page 00000000e6317088 guest entry 0x8f9b7e998f9b7e99 type 9.
 gvt: guest page write error, gpa 1d0d47000
 gvt: vgpu 2: fail: shadow page 000000004594f23f guest entry 0x8f9b7e998f9b7e99 type 9
 gvt: vgpu 2: fail: spt 00000000e6317088 guest entry 0x8f9b7e998f9b7e99 type 9
 gvt: vgpu 2: fail: shadow page 00000000e6317088 guest entry 0x8f9b7e998f9b7e99 type 9.
 gvt: guest page write error, gpa 1d0d47008
 gvt: vgpu 2: fail: shadow page 000000004594f23f guest entry 0x8f9b7e998f9b7e99 type 9
 gvt: vgpu 2: fail: spt 00000000e6317088 guest entry 0x8f9b7e998f9b7e99 type 9
 gvt: vgpu 2: fail: shadow page 00000000e6317088 guest entry 0x8f9b7e998f9b7e99 type 9.
 gvt: guest page write error, gpa 1d0d47010
 gvt: vgpu 2: fail: shadow page 000000004594f23f guest entry 0x8f9b7e998f9b7e99 type 9
 gvt: vgpu 2: fail: spt 00000000e6317088 guest entry 0x8f9b7e998f9b7e99 type 9
 gvt: vgpu 2: fail: shadow page 00000000e6317088 guest entry 0x8f9b7e998f9b7e99 type 9.
 gvt: guest page write error, gpa 1d0d47018
 gvt: vgpu 2: fail: shadow page 000000004594f23f guest entry 0x8f9b7e998f9b7e99 type 9
 gvt: vgpu 2: fail: spt 00000000e6317088 guest entry 0x8f9b7e998f9b7e99 type 9
 gvt: vgpu 2: fail: shadow page 00000000e6317088 guest entry 0x8f9b7e998f9b7e99 type 9.
 gvt: guest page write error, gpa 1d0d47020
 gvt: vgpu 2: fail: shadow page 000000004594f23f guest entry 0x8f9b7e998f9b7e99 type 9
 gvt: vgpu 2: fail: spt 00000000e6317088 guest entry 0x8f9b7e998f9b7e99 type 9
 gvt: vgpu 2: fail: shadow page 00000000e6317088 guest entry 0x8f9b7e998f9b7e99 type 9.
 gvt: guest page write error, gpa 1d0d47028
 gvt: vgpu 2: fail: shadow page 000000004594f23f guest entry 0x8f9b7e998f9b7e99 type 9
 gvt: vgpu 2: fail: spt 00000000e6317088 guest entry 0x8f9b7e998f9b7e99 type 9
 gvt: vgpu 2: fail: shadow page 00000000e6317088 guest entry 0x8f9b7e998f9b7e99 type 9.
 gvt: guest page write error, gpa 1d0d47030
 gvt: vgpu 2: fail: shadow page 000000004594f23f guest entry 0x8f9b7e998f9b7e99 type 9
 gvt: vgpu 2: fail: spt 00000000e6317088 guest entry 0x8f9b7e998f9b7e99 type 9
reedog117 commented 4 years ago

Still happening with 5.4.41-1-pve #1 SMP PVE 5.4.41-1 - Is there a kernel version which may have better support?

I'm also noticing that this issue manifests itself even faster when multiple guests are running at the same time, so I'm suspecting a slow memory leak of some sort.

Also, would i915.enable_execlists=0 help with this?

zkmusa commented 4 years ago

I have the same issue on Ubuntu 20.04, kernel 5.4.0-33-generic. Processor is an i3-8100. Running a Windows 10 VM doing lots of video decoding (Blue Iris server running). It's the same issue as others where the issue slowly builds up over a few hours, and then crashes the host.

Jun 7 14:35:43 lotus kernel: [83783.140368] gvt: guest page write error, gpa 138d58f58 Jun 7 14:35:43 lotus kernel: [83783.140378] gvt: vgpu 1: fail: shadow page 00000000bd68f0a1 guest entry 0x1da000001d9 type 9 Jun 7 14:35:43 lotus kernel: [83783.140379] gvt: vgpu 1: fail: spt 000000003ad8987f guest entry 0x1da000001d9 type 9 Jun 7 14:35:43 lotus kernel: [83783.140381] gvt: vgpu 1: fail: shadow page 000000003ad8987f guest entry 0x1da000001d9 type 9. Jun 7 14:35:43 lotus kernel: [83783.140382] gvt: guest page write error, gpa 138d58f78 Jun 7 14:35:43 lotus kernel: [83783.140397] gvt: guest page write error, gpa 138d58f9c Jun 7 14:35:43 lotus kernel: [83783.140412] gvt: guest page write error, gpa 138d58fa8 Jun 7 14:35:43 lotus kernel: [83783.140431] gvt: guest page write error, gpa 138d58fc0 Jun 7 14:35:43 lotus kernel: [83783.140440] gvt: vgpu 1: fail: shadow page 00000000bd68f0a1 guest entry 0x1da000001d9 type 9 Jun 7 14:35:43 lotus kernel: [83783.140442] gvt: vgpu 1: fail: spt 000000003ad8987f guest entry 0x1da000001d9 type 9 Jun 7 14:35:43 lotus kernel: [83783.140443] gvt: vgpu 1: fail: shadow page 000000003ad8987f guest entry 0x1da000001d9 type 9. Jun 7 14:35:43 lotus kernel: [83783.140445] gvt: guest page write error, gpa 138d58fe0 Jun 7 14:35:43 lotus kernel: [83783.140461] gvt: guest page write error, gpa 138d58254 Jun 7 14:35:43 lotus kernel: [83783.188598] general protection fault: 0000 [#1] SMP PTI Jun 7 14:35:43 lotus kernel: [83783.188603] CPU: 3 PID: 22145 Comm: dmcrypt_write/2 Not tainted 5.4.0-33-generic #37-Ubuntu Jun 7 14:35:43 lotus kernel: [83783.188605] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H370M-ITX/ac, BIOS P1.00 02/21/2018 Jun 7 14:35:43 lotus kernel: [83783.188609] RIP: 0010:kmalloc+0x9e/0x280 Jun 7 14:35:43 lotus kernel: [83783.188611] Code: 87 01 00 00 4d 8b 01 65 49 8b 50 08 65 4c 03 05 80 88 f6 63 4d 8b 20 4d 85 e4 0f 84 9c 01 00 00 41 8b 41 20 49 8b 39 4c 01 e0 <48> 8b 18 48 89 c1 49 33 99 70 01 00 00 4c 89 e0 48 0f c9 48 31 cb Jun 7 14:35:43 lotus kernel: [83783.188614] RSP: 0018:ffffb9a8c3993990 EFLAGS: 00010082 Jun 7 14:35:43 lotus kernel: [83783.188616] RAX: 96a7f948b081a55e RBX: 0000000000000000 RCX: ffffb9a8c0253010 Jun 7 14:35:43 lotus kernel: [83783.188618] RDX: 0000000001a2d735 RSI: 0000000000000a20 RDI: 000000000002f100 Jun 7 14:35:43 lotus kernel: [83783.188619] RBP: ffffb9a8c39939c0 R08: ffff987ede3af100 R09: ffff987edc403180 Jun 7 14:35:43 lotus kernel: [83783.188620] R10: 000000000007f200 R11: 0000000000000421 R12: 96a7f948b081a55e Jun 7 14:35:43 lotus kernel: [83783.188622] R13: 0000000000000a20 R14: 00000000000000b8 R15: ffff987edc403180 Jun 7 14:35:43 lotus kernel: [83783.188623] FS: 0000000000000000(0000) GS:ffff987ede380000(0000) knlGS:0000000000000000 Jun 7 14:35:43 lotus kernel: [83783.188625] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 7 14:35:43 lotus kernel: [83783.188627] CR2: 000055ea156f79cc CR3: 00000004498ea002 CR4: 00000000003626e0 Jun 7 14:35:43 lotus kernel: [83783.188628] DR0: 0000000001542be3 DR1: 0000000000000000 DR2: 0000000000000000 Jun 7 14:35:43 lotus kernel: [83783.188629] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jun 7 14:35:43 lotus kernel: [83783.188631] Call Trace: Jun 7 14:35:43 lotus kernel: [83783.188634] ? usb_alloc_urb+0x29/0x60 Jun 7 14:35:43 lotus kernel: [83783.188636] usb_alloc_urb+0x29/0x60 Jun 7 14:35:43 lotus kernel: [83783.188640] uas_submit_urbs+0x273/0x4f0 [uas] Jun 7 14:35:43 lotus kernel: [83783.188642] uas_queuecommand+0x15e/0x2e0 [uas] Jun 7 14:35:43 lotus kernel: [83783.188645] scsi_queue_rq+0x68d/0xa00 Jun 7 14:35:43 lotus kernel: [83783.188647] blk_mq_dispatch_rq_list+0x93/0x550 Jun 7 14:35:43 lotus kernel: [83783.188649] ? deadline_remove_request+0x4e/0xb0 Jun 7 14:35:43 lotus kernel: [83783.188651] ? dd_dispatch_request+0x21/0x1f0 Jun 7 14:35:43 lotus kernel: [83783.188653] blk_mq_do_dispatch_sched+0x67/0x100 Jun 7 14:35:43 lotus kernel: [83783.188655] blk_mq_sched_dispatch_requests+0x12d/0x180 Jun 7 14:35:43 lotus kernel: [83783.188658] blk_mq_run_hw_queue+0x5a/0x110 Jun 7 14:35:43 lotus kernel: [83783.188660] blk_mq_delay_run_hw_queue+0x15b/0x160 Jun 7 14:35:43 lotus kernel: [83783.188662] blk_mq_run_hw_queue+0x92/0x120 Jun 7 14:35:43 lotus kernel: [83783.188664] blk_mq_sched_insert_requests+0x74/0x100 Jun 7 14:35:43 lotus kernel: [83783.188665] blk_mq_flush_plug_list+0x1e8/0x290 Jun 7 14:35:43 lotus kernel: [83783.188667] ? blk_mq_get_tag+0x28/0x80 Jun 7 14:35:43 lotus kernel: [83783.188669] blk_flush_plug_list+0xe3/0x110 Jun 7 14:35:43 lotus kernel: [83783.188670] blk_mq_make_request+0x24f/0x5b0 Jun 7 14:35:43 lotus kernel: [83783.188672] generic_make_request+0xcf/0x320 Jun 7 14:35:43 lotus kernel: [83783.188676] dmcrypt_write+0x141/0x170 [dm_crypt] Jun 7 14:35:43 lotus kernel: [83783.188678] kthread+0x104/0x140 Jun 7 14:35:43 lotus kernel: [83783.188680] ? crypt_iv_lmk_ctr+0xd0/0xd0 [dm_crypt] Jun 7 14:35:43 lotus kernel: [83783.188682] ? kthread_park+0x90/0x90 Jun 7 14:35:43 lotus kernel: [83783.188685] ret_from_fork+0x35/0x40 Jun 7 14:35:43 lotus kernel: [83783.188686] Modules linked in: vhost_net vhost macvtap macvlan tap xt_CHECKSUM ip6table_mangle ip6table_nat iptable_mangle sha256_ssse3 dm_crypt ipt_REJECT nf_reject_ipv4 xt_multiport rfcomm veth xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter bridge xfrm_user xfrm_algo ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter aufs overlay cmac algif_hash algif_skcipher af_alg bnep iptable_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter 8021q garp mrp stp llc snd_hda_codec_hdmi snd_sof_pci snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda snd_sof_intel_byt snd_sof_intel_ipc snd_sof snd_sof_xtensa_dsp snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_realtek snd_soc_core snd_hda_codec_generic ledtrig_audio snd_compress ac97_bus snd_pcm_dmaengine intel_rapl_msr snd_hda_intel mei_hdcp snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi

reedog117 commented 4 years ago

Attempting to consolidate these crashes in #153 since they all seem similar. They all end up with a guest page write error with spt and type 9 mentioned. In addition people are reporting this bug in different rpm-based distros as well.

rubenvb commented 3 years ago

I just played around with this again on the same i7-8565U on kernel version 5.10 and it did not hard crash like this. I could install the latest Intel driver without issue and it seemed to work fine. I ran into the "suspend freezes host" issue detailed here, but didn't try the systemd service workaround.

For me, this specific issue is fixed. If no one reports otherwise, I think this can be closed.

(if only I could get the guest display to resize automatically through virt-manager's UI, I would be set completely and have a fully functional GPU in my Windows guest.)

aspieln3r commented 3 years ago

same issue on Linux s510u 5.10.45-1-lts #1 SMP Fri, 18 Jun 2021 09:39:32 +0000 x86_64 GNU/Linux intel i5-8250 omitting long list of error mesages

.
.
.
Jun 29 12:06:10 s510u systemd-journald[233]: Missed 561 kernel messages
Jun 29 12:06:10 s510u kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xffffffffffffffff type 9
Jun 29 12:06:10 s510u kernel: gvt: guest page write error, gpa 1ca00bc30
Jun 29 12:06:10 s510u kernel: gvt: vgpu 1: fail: shadow page 000000005eae5388 guest entry 0xffffffffffffffff type 9.
Jun 29 12:06:10 s510u kernel: gvt: vgpu 1: fail: spt 000000005eae5388 guest entry 0xffffffffffffffff type 9
Jun 29 12:06:10 s510u kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xffffffffffffffff type 9
Jun 29 12:06:10 s510u kernel: gvt: guest page write error, gpa 1ca00bc28
Jun 29 12:06:10 s510u kernel: gvt: vgpu 1: fail: shadow page 000000005eae5388 guest entry 0xffffffffffffffff type 9.
Jun 29 12:06:10 s510u kernel: gvt: vgpu 1: fail: spt 000000005eae5388 guest entry 0xffffffffffffffff type 9
Jun 29 12:06:10 s510u kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xffffffffffffffff type 9
Jun 29 12:06:10 s510u systemd-journald[233]: Missed 570 kernel messages
Jun 29 12:06:10 s510u kernel: gvt: vgpu 1: fail: spt 000000005eae5388 guest entry 0xffffffffffffffff type 9
Jun 29 12:06:10 s510u systemd-journald[233]: Missed 984 kernel messages
Jun 29 12:06:10 s510u kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xffffffffffffffff type 9
TerrenceXu commented 3 years ago

@aspieln3r what is your reproduce steps?

aspieln3r commented 3 years ago

It comes randomly when I keep running Photoshop for 3-4 hours. Cannot reproduce this reliably. One more thing I noticed is that after qemu crashes and if I kill it, memory used by qemu (~8 GB)is not freed. If I use htop to see who's consuming this memory, its not showing any processor name. This much memory is used by something if I check total memory consumption but no one knows by whom. A reboot is necessary to free the memory. I will attack full log once I get back to my machine.

TerrenceXu commented 3 years ago

@aspieln3r we still cannot reproduce it, can you share what VM windows gfx driver you used?

aspieln3r commented 3 years ago

vfio_region_write(7a537d3e-d35c-4306-8482-bdd75b64e761:region0+0x2230, 0x10072119,4) failed: Bad address

60

Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x190f, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail: shadow page 00000000ac44b429 guest entry 0x190f007 type 2
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0x1d6d84007 type 7
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail: shadow page 00000000c7d968e8 guest entry 0x1d6d84007 type 7
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0x1844007 type 8
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail: shadow page 000000009b4f4b11 guest entry 0x1844007 type 8
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0x1946007 type 9
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail: shadow page 000000007c54e5f5 guest entry 0x1946007 type 9
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0x1947000 type 12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest root pointer
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: failed to shadow ppgtt mm
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to create mm
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d84, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d85, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d86, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d87, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d89, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d8a, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d8b, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d8c, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d8d, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d8e, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d8f, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d90, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d91, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d92, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d93, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d94, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d95, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d96, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d97, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d98, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d99, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d9a, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d9b, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8d9c, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8c9d, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8c9e, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8c9f, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8ca0, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8ca1, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8ca2, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x1d8ca3, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest ggtt entry
Jul 28 15:46:40 s510u kernel: vfio_pin_page_external: Task qemu-system-x86 (569647) RLIMIT_MEMLOCK (1048576) exceeded
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: vfio_pin_pages failed for gfn 0x190f, ret -12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail: shadow page 00000000ac44b429 guest entry 0x190f007 type 2
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0x1d6d84007 type 7
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail: shadow page 00000000c7d968e8 guest entry 0x1d6d84007 type 7
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0x1844007 type 8
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail: shadow page 000000009b4f4b11 guest entry 0x1844007 type 8
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0x1946007 type 9
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail: shadow page 000000007c54e5f5 guest entry 0x1946007 type 9
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0x1947000 type 12
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to populate guest root pointer
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: failed to shadow ppgtt mm
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to create mm
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: failed to submit desc 0
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail submit workload on ring rcs0
Jul 28 15:46:40 s510u kernel: gvt: vgpu 1: fail to emulate MMIO write 00002230 len 4

currently the vm is not booting. I'm using qemu gfx driver used: 27.20.100.9030 [ installers for 25.20.100.6326 doesnt exist anymore in intel drivers page ] windows 10 pro 20H2 I can boot the system if I use -vga std option in qemu

aspieln3r commented 3 years ago

I booted using vga -std option, uninstalled the 27.20.100.9030 driver and rebooted with vga none and with the intel graphics device enabled. Booting was successful and during installation of the driver 25.20.100.6326 which was recommended at #60 [ I got from softpedia and it seems to come with some adware ], I get the same error on the terminal where I ran qemu script vfio_region_write(7a537d3e-d35c-4306-8482-bdd75b64e761:region0+0x2230, 0x1000f119,4) failed: Bad address Window is black and doesnt seem to take any input. System is not booting again with gpu passthrough

script used:

#!/bin/bash
# -usb -device usb-host,hostbus=1,vendorid=0x413c,productid=0x301a
GVT_GUID=7a537d3e-d35c-4306-8482-bdd75b64e761
[ ! -d "/sys/bus/mdev/devices/$GVT_GUID/" ] && echo "$GVT_GUID" > /sys/devices/pci0000:00/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_4/create && echo "created new vGPU"
echo "..."
[ ! -d "/sys/bus/mdev/devices/$GVT_GUID/" ] && exit 1
echo "starting VM..."
qemu-system-x86_64 -netdev user,restrict=yes,id=net0,smb=/vm_ground/windoz/ -device virtio-net-pci,netdev=net0 \
     -enable-kvm -name win10,debug-threads=on \
     -rtc base=localtime \
     -m 6G -cpu host,hv-relaxed,hv-spinlocks=0x1fff,hv-vapic,hv_time -smp sockets=1,cores=4,threads=2 \
     -machine type=q35,accel=kvm \
     -drive file=/vm_ground/kvm\ storage/Win10_20H2_v2_EnglishInternational_x64.iso,media=cdrom \
     -drive file=/vm_ground/kvm\ storage/virtio-win-0.1.185.iso,media=cdrom \
     -drive file=/dev/sda4,format=raw,cache=none,if=virtio -boot d \
     -drive file=/vm_ground/kvm\ storage/fake.qcow2,if=virtio \
     -M graphics=off -vga none \
     -nodefaults \
     -serial stdio \
     -display gtk,gl=on \
     -usb -device usb-tablet \
     -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/7a537d3e-d35c-4306-8482-bdd75b64e761,display=on,x-igd-opregion=on,ramfb=on,driver=vfio-pci-nohotplug

I think the current issue I face #60 . I should take this there. Sorry I cant help with the current issue without solving this first

aspieln3r commented 3 years ago

echo 1 > /sys/bus/mdev/devices/$GVT_GUID/remove solved the issue. System is bootable now once I recreated the VGPU. I'll do more testing and will report if I get something again

aspieln3r commented 3 years ago

ok I've hit this issue again. My windows system was working fine after this and since the last driver install failed, I decided to reinstall 25.20.100.6326, and it was successful. as soon as I hit the reboot, display went blank and I cannot kill qemu by simply closing window now or using kill -9 echo 1 > /sys/bus/mdev/devices/$GVT_GUID/remove simply hangs Memory used by the qemu process seems to be freed although the window is not closed. htop shows it as a zombie process. @TerrenceXu log: https://pastebin.com/EGvWuyhK edit: my linux host machine refuses to shutdown too. shutdown process initially gets hung by the message sending sigkill to qemu-system-x86 and then fails unmounting filesystems and all. Unfortunately, I couldnt find these logs after forcefully rebooting using journalctl as journald was killed before all this. VM now boots normally after a full host restart.

TerrenceXu commented 3 years ago

@asieln3r, 6326 is too old and it not include some latest virtualization changes. Can you change the driver to 30.0.100.9684 (https://downloadcenter.intel.com/download/30579/Intel-Graphics-Windows-DCH-Drivers) and try again. From our side it is stable with kernel 5.11 (5.12 has a regression https://github.com/intel/gvt-linux/issues/188 and the bug fix patch hasn't been upstream until now).

alpozcan commented 3 years ago

I had a windows 10 guest hard lock up - even ignoring libvirt force shutdown command. In dmesg:

[ 7582.907024] gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0x2000543ee82021e7 type 9
[ 7582.907025] gvt: vgpu 1: fail: spt 00000000faf6d0e2 guest entry 0x2000543ee82021e7 type 9
[ 7582.907026] gvt: vgpu 1: fail: shadow page 00000000faf6d0e2 guest entry 0x2000543ee82021e7 type 9.
[ 7582.907027] gvt: guest page write error, gpa 1c2152b68

Followed by the below:

...
[ 7582.912193] gvt: guest page write error, gpa 1c2152ff0
[ 7734.412979] INFO: task CPU 6/KVM:11091 blocked for more than 120 seconds.
[ 7734.412985]       Tainted: G           O      5.11.0-31-generic #33-Ubuntu
[ 7734.412986] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7734.412987] task:CPU 6/KVM       state:D stack:    0 pid:11091 ppid:     1 flags:0x00000000
[ 7734.412990] Call Trace:
[ 7734.412993]  __schedule+0x23d/0x670
[ 7734.412998]  ? x86_emulate_instruction+0x2e9/0x7f0 [kvm]
[ 7734.413038]  schedule+0x4f/0xc0
[ 7734.413040]  schedule_preempt_disabled+0xe/0x10
[ 7734.413041]  __mutex_lock.constprop.0+0x309/0x4d0
[ 7734.413043]  ? load_fixmap_gdt+0x23/0x30
[ 7734.413046]  __mutex_lock_slowpath+0x13/0x20
[ 7734.413047]  mutex_lock+0x34/0x40
[ 7734.413049]  intel_vgpu_emulate_mmio_read+0x51/0x3f0 [i915]
[ 7734.413121]  intel_vgpu_rw+0x1e4/0x220 [kvmgt]
[ 7734.413124]  intel_vgpu_read+0x14a/0x1f0 [kvmgt]
[ 7734.413126]  vfio_mdev_read+0x22/0x30 [vfio_mdev]
[ 7734.413128]  vfio_device_fops_read+0x26/0x30
[ 7734.413131]  vfs_read+0xb5/0x1c0
[ 7734.413134]  __x64_sys_pread64+0x93/0xc0
[ 7734.413136]  do_syscall_64+0x38/0x90
[ 7734.413137]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 7734.413140] RIP: 0033:0x7f65b8066d4f
[ 7734.413142] RSP: 002b:00007f6396ffc190 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
[ 7734.413144] RAX: ffffffffffffffda RBX: 00005602c9523d78 RCX: 00007f65b8066d4f
[ 7734.413146] RDX: 0000000000000004 RSI: 00007f6396ffc1d8 RDI: 0000000000000026
[ 7734.413147] RBP: 0000000000000004 R08: 0000000000000000 R09: 00000000ffffffff
[ 7734.413148] R10: 0000000000044404 R11: 0000000000000293 R12: 0000000000044404
[ 7734.413149] R13: 00005602c9523c90 R14: 0000000000000004 R15: 0000000000044404
[ 7855.253897] INFO: task CPU 6/KVM:11091 blocked for more than 241 seconds.
[ 7855.253904]       Tainted: G           O      5.11.0-31-generic #33-Ubuntu
[ 7855.253906] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7855.253907] task:CPU 6/KVM       state:D stack:    0 pid:11091 ppid:     1 flags:0x00000000
[ 7855.253911] Call Trace:
[ 7855.253914]  __schedule+0x23d/0x670
[ 7855.253920]  ? x86_emulate_instruction+0x2e9/0x7f0 [kvm]
[ 7855.253966]  schedule+0x4f/0xc0
[ 7855.253968]  schedule_preempt_disabled+0xe/0x10
[ 7855.253970]  __mutex_lock.constprop.0+0x309/0x4d0
[ 7855.253972]  ? load_fixmap_gdt+0x23/0x30
[ 7855.253975]  __mutex_lock_slowpath+0x13/0x20
[ 7855.253977]  mutex_lock+0x34/0x40
[ 7855.253979]  intel_vgpu_emulate_mmio_read+0x51/0x3f0 [i915]
[ 7855.254063]  intel_vgpu_rw+0x1e4/0x220 [kvmgt]
[ 7855.254066]  intel_vgpu_read+0x14a/0x1f0 [kvmgt]
[ 7855.254069]  vfio_mdev_read+0x22/0x30 [vfio_mdev]
[ 7855.254071]  vfio_device_fops_read+0x26/0x30
[ 7855.254074]  vfs_read+0xb5/0x1c0
[ 7855.254079]  __x64_sys_pread64+0x93/0xc0
[ 7855.254080]  do_syscall_64+0x38/0x90
[ 7855.254082]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 7855.254085] RIP: 0033:0x7f65b8066d4f
[ 7855.254087] RSP: 002b:00007f6396ffc190 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
[ 7855.254089] RAX: ffffffffffffffda RBX: 00005602c9523d78 RCX: 00007f65b8066d4f
[ 7855.254090] RDX: 0000000000000004 RSI: 00007f6396ffc1d8 RDI: 0000000000000026
[ 7855.254092] RBP: 0000000000000004 R08: 0000000000000000 R09: 00000000ffffffff
[ 7855.254093] R10: 0000000000044404 R11: 0000000000000293 R12: 0000000000044404
[ 7855.254094] R13: 00005602c9523c90 R14: 0000000000000004 R15: 0000000000044404

The above and the first line of below snipped are repeated many times in the kernel log.

system info: Linux n8 5.11.0-31-generic #33-Ubuntu SMP Wed Aug 11 13:19:04 UTC 2021 x86_64 Kubuntu 21.04 version B4 of looking glass was running at the time. Intel 8259U CPU with iris plus 655 iGPU.

takerukoushirou commented 3 years ago

On kernel 5.11.22-3-pve #1 SMP PVE 5.11.22-7 (Proxmox 7), my Windows 10 VM (all latest updates installed) with Intel drivers 30.0.100.9684 seemed stable for a while, but ultimately crashed again with the same old problem:

Sep  2 01:25:04 pve kernel: [634952.819964] gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xffffffffffffffff type 9
Sep  2 01:25:04 pve kernel: [634952.819972] gvt: vgpu 1: fail: spt 00000000b57810b8 guest entry 0xffffffffffffffff type 9
Sep  2 01:25:04 pve kernel: [634952.819975] gvt: vgpu 1: fail: shadow page 00000000b57810b8 guest entry 0xffffffffffffffff type 9.
Sep  2 01:25:04 pve kernel: [634952.819977] gvt: guest page write error, gpa 10fccd000
[... several hundreds more with different gpa value]
Sep  2 01:25:04 pve kernel: [634952.864430] gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0x84285b5657b9feb type 9
Sep  2 01:25:04 pve kernel: [634952.864435] gvt: vgpu 1: fail: spt 00000000b57810b8 guest entry 0x84285b5657b9feb type 9
Sep  2 01:25:04 pve kernel: [634952.864437] gvt: vgpu 1: fail: shadow page 00000000b57810b8 guest entry 0x84285b5657b9feb type 9.
Sep  2 01:25:04 pve kernel: [634952.864439] gvt: guest page write error, gpa 10fccd010
[... several hundreds more with different guest entry and gpa value]
Sep  2 01:25:04 pve kernel: [634952.956728] gvt: guest page write error, gpa 10fccd000
Sep  2 01:25:04 pve kernel: [634953.035548] gvt: guest page write error, gpa 10fccd000
Sep  2 01:25:04 pve kernel: [634953.035555] gvt: guest page write error, gpa 10fccd010
[... many more]
Sep  2 01:26:00 pve kernel: [635009.219803] gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0x656c6c695f746c75 type 9
Sep  2 01:26:00 pve kernel: [635009.219811] gvt: vgpu 1: fail: spt 00000000b57810b8 guest entry 0x656c6c695f746c75 type 9
Sep  2 01:26:00 pve kernel: [635009.219814] gvt: vgpu 1: fail: shadow page 00000000b57810b8 guest entry 0x656c6c695f746c75 type 9.
Sep  2 01:26:00 pve kernel: [635009.219816] gvt: vgpu 1: fail to flush post shadow
Sep  2 01:26:00 pve kernel: [635009.219817] gvt: vgpu 1: fail to dispatch workload, skip
Sep  2 01:26:12 pve pvestatd[2179]: VM 4600 qmp command failed - VM 4600 qmp command 'query-proxmox-support' failed - unable to connect to VM 4600 qmp socket - timeout after 31 retries
Sep  2 01:26:12 pve pvedaemon[3845812]: VM 4600 qmp command failed - VM 4600 qmp command 'guest-ping' failed - got timeout
[... repeats multiple times, then]
Sep  2 01:29:17 pve kernel: [635206.163805] INFO: task kvm:18952 blocked for more than 120 seconds.
Sep  2 01:29:17 pve kernel: [635206.163811]       Tainted: P           O      5.11.22-3-pve #1
Sep  2 01:29:17 pve kernel: [635206.163813] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep  2 01:29:17 pve kernel: [635206.163815] task:kvm             state:D stack:    0 pid:18952 ppid:     1 flags:0x00000000
Sep  2 01:29:17 pve kernel: [635206.163819] Call Trace:
Sep  2 01:29:17 pve kernel: [635206.163822]  __schedule+0x2ca/0x880
Sep  2 01:29:17 pve kernel: [635206.163827]  schedule+0x4f/0xc0
Sep  2 01:29:17 pve kernel: [635206.163829]  schedule_preempt_disabled+0xe/0x10
Sep  2 01:29:17 pve kernel: [635206.163831]  __mutex_lock.constprop.0+0x309/0x4d0
Sep  2 01:29:17 pve kernel: [635206.163833]  __mutex_lock_slowpath+0x13/0x20
Sep  2 01:29:17 pve kernel: [635206.163836]  mutex_lock+0x34/0x40
Sep  2 01:29:17 pve kernel: [635206.163838]  intel_vgpu_emulate_mmio_write+0x4e/0x2e0 [i915]
Sep  2 01:29:17 pve kernel: [635206.163901]  intel_vgpu_rw+0xca/0x220 [kvmgt]
Sep  2 01:29:17 pve kernel: [635206.163904]  intel_vgpu_write+0x169/0x1f0 [kvmgt]
Sep  2 01:29:17 pve kernel: [635206.163907]  vfio_mdev_write+0x22/0x30 [vfio_mdev]
Sep  2 01:29:17 pve kernel: [635206.163910]  vfio_device_fops_write+0x26/0x30 [vfio]
Sep  2 01:29:17 pve kernel: [635206.163913]  vfs_write+0xc6/0x270
Sep  2 01:29:17 pve kernel: [635206.163917]  __x64_sys_pwrite64+0x93/0xc0
Sep  2 01:29:17 pve kernel: [635206.163919]  do_syscall_64+0x38/0x90
Sep  2 01:29:17 pve kernel: [635206.163922]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Sep  2 01:29:17 pve kernel: [635206.163925] RIP: 0033:0x7f549c06e9c7
Sep  2 01:29:17 pve kernel: [635206.163928] RSP: 002b:00007f548b3fa0f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Sep  2 01:29:17 pve kernel: [635206.163930] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f549c06e9c7
Sep  2 01:29:17 pve kernel: [635206.163932] RDX: 0000000000000004 RSI: 00007f548b3fa138 RDI: 0000000000000027
Sep  2 01:29:17 pve kernel: [635206.163934] RBP: 0000000000000004 R08: 0000000000000000 R09: 00000000ffffffff
Sep  2 01:29:17 pve kernel: [635206.163936] R10: 0000000000078804 R11: 0000000000000293 R12: 0000000000078804
Sep  2 01:29:17 pve kernel: [635206.163937] R13: 0000000000000001 R14: 0000561290ed2a80 R15: 0000561290ed2990
[.. and repeats several times]

SPT and shadow page values being constant in the error messages.

Also tried the newest Intel drivers, issue appeared directly again.

CPU: Intel(R) Core(TM) i7-10710U QEMU hostpci0: 00:02.0,mdev=i915-GVTg_V5_2,pcie=1,x-vga=1

tpressure commented 3 years ago

I can also confirm that this still happens even with the latest Intel drivers.

A workaround for this issue is to disable screen blanking in the Windows energy options.

kristophercrawford commented 3 years ago

@tpressure - I have tried setting the screen sleep dropdown to zero on my Windows 10 Virtual Machine and still encounter lockups.

takerukoushirou commented 3 years ago

I have screen sleep and all other Windows power savings disabled since short after setting up the VM a year ago; didn't reduce the crashes.

takerukoushirou commented 2 years ago

Just tried again with latest 5.13.19-3-pve #1 SMP PVE 5.13.19-7 kernel, issue persists, VGPU failed within a few minutes.

tpressure commented 2 years ago

Please see my comment here: https://github.com/intel/gvt-linux/issues/153#issuecomment-1044747758