intel / gvt-linux

Other
509 stars 95 forks source link

dvt+dmabuff: GPU HANG + NULL pointer dereference on Linux 5.2.0 #110

Closed Maryse47 closed 3 years ago

Maryse47 commented 5 years ago

Host: Linux 5.2.0 Guest: Windows 10 /sys/class/drm/card0/error

It may be related to https://github.com/intel/gvt-linux/issues/104

gvt: vgpu 1: workload shadow ppgtt isn't ready
gvt: vgpu 1: fail to dispatch workload, skip
gvt: vgpu 1: workload shadow ppgtt isn't ready
gvt: vgpu 1: fail to dispatch workload, skip
gvt: vgpu 1: workload shadow ppgtt isn't ready
gvt: vgpu 1: fail to dispatch workload, skip
gvt: vgpu 1: workload shadow ppgtt isn't ready
gvt: vgpu 1: fail to dispatch workload, skip
gvt: vgpu 1: workload shadow ppgtt isn't ready
gvt: vgpu 1: fail to dispatch workload, skip
gvt: vgpu 1: workload shadow ppgtt isn't ready
gvt: vgpu 1: fail to dispatch workload, skip
DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Write] Request device [00:02.0] fault addr fffffffefd8c0000 [fault reason 07] Next page table ptr is invalid
i915 0000:00:02.0: GPU HANG: ecode 9:1:0xfffffffe, in remote-viewer [4205], hang on rcs0
[drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[drm] GPU crash dump saved to /sys/class/drm/card0/error
i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Write] Request device [00:02.0] fault addr fffffffefd8c0000 [fault reason 07] Next page table ptr is invalid
i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] SMP PTI
CPU: 3 PID: 3798 Comm: kworker/u8:3 Tainted: G                T 5.2.0 #1
Workqueue: i915 __i915_gem_free_work [i915]
RIP: 0010:__list_del_entry_valid+0x32/0x52
Code: 48 b8 00 01 00 00 00 00 ad de 4c 8b 45 08 48 39 c2 0f 84 75 00 00 00 48 b8 00 02 00 00 00 00 ad de 49 39 c0 0f 84 95 00 00 00 <49> 8b 30 48 39 ee 0f 85 75 00 00 00 48 8b 52 08 48 39 f2 0f 85 5a
RSP: 0018:ffffaef4897d3d98 EFLAGS: 00010207
RAX: dead000000000200 RBX: ffff8f4b7b674cc8 RCX: 0000000000380021
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8f4b7b674cc8
RBP: ffff8f4b7b674cc8 R08: 0000000000000000 R09: ffffffffc0706e00
R10: 0000000000000000 R11: 0000000000000004 R12: ffffaef480f17000
R13: ffff8f4b7b674ca8 R14: ffff8f4ce4200340 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8f4d29580000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001bcd58006 CR4: 00000000003626e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 dmabuf_gem_object_free+0xc6/0x100 [i915]
 vgpu_gem_release+0x5d/0xa0 [i915]
 __i915_gem_free_objects+0x146/0x2c0 [i915]
 __i915_gem_free_work+0x64/0x90 [i915]
 process_one_work+0x198/0x380
 worker_thread+0x4d/0x390
 kthread+0xfa/0x130
 ? process_one_work+0x380/0x380
 ? kthread_park+0x90/0x90
 ret_from_fork+0x35/0x40
Modules linked in: fuse 8021q garp stp mrp llc ccm algif_aead des_generic cmac md4 algif_hash wacom snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic arc4 hid_sensor_als hid_sensor_magn_3d hid_sensor_accel_3d hid_sensor_gyro_3d hid_sensor_rotation hid_sensor_trigger hid_sensor_iio_common industrialio_triggered_buffer kfifo_buf industrialio nf_log_ipv4 nf_log_common joydev iwlmvm mousedev hid_sensor_hub nft_counter intel_ishtp_hid mac80211 xt_mark ipt_REJECT nf_reject_ipv4 xt_LOG snd_soc_skl xt_addrtype xt_tcpudp xt_conntrack nf_conntrack snd_soc_hdac_hda snd_hda_ext_core nf_defrag_ipv4 libcrc32c nft_compat snd_soc_skl_ipc iwlwifi nf_tables nfnetlink snd_soc_sst_dsp snd_soc_sst_ipc mei_hdcp nls_iso8859_1 wmi_bmof nls_cp437 cfg80211 intel_wmi_thunderbolt vfat snd_soc_core fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_soc_acpi_intel_match snd_soc_acpi kvm_intel snd_hda_intel psmouse snd_hda_codec intel_cstate intel_uncore snd_hwdep input_leds
 snd_hda_core intel_rapl_perf rtsx_pci_ms memstick mei_me snd_pcm intel_xhci_usb_role_switch mei roles snd_timer intel_ish_ipc intel_ishtp intel_pch_thermal ucsi_acpi thinkpad_acpi typec_ucsi typec wmi nvram ledtrig_audio snd soundcore rfkill ac battery tpm_crb tpm_tis tpm_tis_core i2c_hid tpm evdev rng_core mac_hid pcc_cpufreq loop pkcs8_key_parser ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 algif_skcipher af_alg hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid dm_crypt dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel rtsx_pci_sdmmc serio_raw mmc_core atkbd libps2 aesni_intel aes_x86_64 glue_helper crypto_simd cryptd rtsx_pci xhci_pci i8042 serio xhci_hcd kvmgt i915 intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart vfio_mdev mdev vfio_iommu_type1 vfio kvm irqbypass
CR2: 0000000000000000
---[ end trace 115a5a7d3430b7de ]---
RIP: 0010:__list_del_entry_valid+0x32/0x52
Code: 48 b8 00 01 00 00 00 00 ad de 4c 8b 45 08 48 39 c2 0f 84 75 00 00 00 48 b8 00 02 00 00 00 00 ad de 49 39 c0 0f 84 95 00 00 00 <49> 8b 30 48 39 ee 0f 85 75 00 00 00 48 8b 52 08 48 39 f2 0f 85 5a
RSP: 0018:ffffaef4897d3d98 EFLAGS: 00010207
RAX: dead000000000200 RBX: ffff8f4b7b674cc8 RCX: 0000000000380021
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8f4b7b674cc8
RBP: ffff8f4b7b674cc8 R08: 0000000000000000 R09: ffffffffc0706e00
R10: 0000000000000000 R11: 0000000000000004 R12: ffffaef480f17000
R13: ffff8f4b7b674ca8 R14: ffff8f4ce4200340 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8f4d29580000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001bcd58006 CR4: 00000000003626e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
zhenyw commented 3 years ago

what about now? And what's the windows driver version?

Maryse47 commented 3 years ago

I didn't tested it again for the time being as I'm afraid those crashes may eventually damage my vm image. I'll close it for now and re-open when I reproduce it.