intel / gvt-linux

Other
503 stars 94 forks source link

panic in intel_gvt_debugfs_clean starting VM #221

Open Spongman opened 1 year ago

Spongman commented 1 year ago

this is the host crashing. CPU: i3-8100

[    0.000000] Linux version 5.19.17-Unraid (root@Develop) (gcc (GCC) 12.2.0, GNU ld version 2.39-slack151) #2 SMP PREEMPT_DYNAMIC Wed Nov 2 11:54:15 PDT 2022
[    0.000000] Command line: BOOT_IMAGE=/bzimage initrd=/bzroot
...
[   98.835436] Setting dangerous option enable_guc - tainting kernel
[   98.835440] Setting dangerous option force_probe - tainting kernel
[   98.835916] i915 0000:00:02.0: [drm] Incompatible option enable_guc=3 - GuC submission is N/A
[   98.836490] i915 0000:00:02.0: [drm] VT-d active for gfx access
[   98.836588] Console: switching to colour dummy device 80x25
[   98.836624] i915 0000:00:02.0: vgaarb: deactivate vga console
[   98.836651] i915 0000:00:02.0: [drm] Transparent Hugepage mode 'huge=within_size'
[   98.838131] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[   98.841126] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
[   98.856491] i915 0000:00:02.0: [drm] GuC firmware i915/kbl_guc_70.1.1.bin version 70.1
[   98.856495] i915 0000:00:02.0: [drm] HuC firmware i915/kbl_huc_4.0.0.bin version 4.0
[   98.880358] i915 0000:00:02.0: [drm] HuC authenticated
[   98.880362] i915 0000:00:02.0: [drm] GuC submission disabled
[   98.880363] i915 0000:00:02.0: [drm] GuC SLPC disabled
[   98.901249] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0
[   98.903109] ACPI: video: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[   98.903479] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input7
[   98.937135] fbcon: i915drmfb (fb0) is primary device
[   98.950311] Console: switching to colour frame buffer device 200x56
[   98.967845] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
[   99.016331] i915 0000:00:02.0: Direct firmware load for i915/gvt/vid_0x8086_did_0x3e91_rid_0x00.golden_hw_state failed with error -2
[   99.018239] i915 0000:00:02.0: MDEV: Registered
...
[  225.318583] i915 0000:00:02.0: MDEV: Unregistering
[  225.318663] BUG: kernel NULL pointer dereference, address: 0000000000000098
[  225.318667] #PF: supervisor write access in kernel mode
[  225.318670] #PF: error_code(0x0002) - not-present page
[  225.318673] PGD 0 P4D 0
[  225.318676] Oops: 0002 [#1] PREEMPT SMP PTI
[  225.318679] CPU: 3 PID: 2849 Comm: rpc-libvirtd Tainted: G     U            5.19.17-Unraid #2
[  225.318684] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z390M-ITX/ac, BIOS P4.20 08/05/2019
[  225.318688] RIP: 0010:rwsem_write_trylock+0x7/0x23
[  225.318694] Code: cc cc cc 48 8b 47 08 a8 01 74 13 a8 02 75 0f 48 89 c2 48 83 ca 02 f0 48 0f b1 57 08 75 e9 c3 cc cc cc cc 31 c0 ba 01 00 00 00 <f0> 48 0f b1 17 0f 94 c0 75 0d 65 48 8b 14 25 c0 bb 01 00 48 89 57
[  225.318701] RSP: 0018:ffffc900019d3be8 EFLAGS: 00010246
[  225.318704] RAX: 0000000000000000 RBX: 0000000000000098 RCX: 0000000000000064
[  225.318707] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 0000000000000098
[  225.318710] RBP: ffffc900019d3c90 R08: ffff888154795180 R09: 0000000080150012
[  225.318713] R10: ffff888154795180 R11: 0000000000000000 R12: ffffffffa0956560
[  225.318716] R13: ffff8881056c5d80 R14: ffff8881056c5d80 R15: ffff888105073260
[  225.318720] FS:  0000153f3a7656c0(0000) GS:ffff88845e180000(0000) knlGS:0000000000000000
[  225.318724] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  225.318726] CR2: 0000000000000098 CR3: 00000001050d4006 CR4: 00000000003706e0
[  225.318730] Call Trace:
[  225.318733]  <TASK>
[  225.318734]  __down_write_common+0x31/0x4e9
[  225.318740]  ? __slab_free+0x83/0x29a
[  225.318746]  simple_recursive_removal+0x3f/0x246
[  225.318750]  ? debug_mount+0x26/0x26
[  225.318753]  ? preempt_latency_start+0x2b/0x46
[  225.318758]  ? mntget+0x1c/0x25
[  225.318761]  debugfs_remove+0x40/0x5f
[  225.318765]  intel_gvt_debugfs_clean+0x15/0x24 [kvmgt]
[  225.318776]  intel_gvt_clean_device+0x4c/0xd2 [kvmgt]
[  225.318787]  intel_gvt_clean_device+0x20/0x53 [i915]
[  225.318864]  intel_gvt_driver_remove+0x1d/0x5b [i915]
[  225.318931]  i915_driver_remove+0x87/0xc9 [i915]
[  225.318984]  i915_pci_remove+0x1a/0x29 [i915]
[  225.319037]  pci_device_remove+0x33/0x89
[  225.319043]  device_release_driver_internal+0xbc/0x13e
[  225.319048]  unbind_store+0x54/0x74
[  225.319051]  kernfs_fop_write_iter+0x134/0x17f
[  225.319056]  new_sync_write+0x7c/0xbb
[  225.319061]  ? intel_pmu_lbr_read_64+0x115/0x232
[  225.319066]  vfs_write+0xda/0x129
[  225.319069]  ksys_write+0x76/0xc2
[  225.319073]  do_syscall_64+0x68/0x81
[  225.319077]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  225.319082] RIP: 0033:0x153f3bf5d42f
[  225.319085] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 19 2d f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 6c 2d f8 ff 48
[  225.319092] RSP: 002b:0000153f3a764420 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[  225.319096] RAX: ffffffffffffffda RBX: 0000000000000020 RCX: 0000153f3bf5d42f
[  225.319099] RDX: 000000000000000c RSI: 0000153f2c037e70 RDI: 0000000000000020
[  225.319102] RBP: 000000000000000c R08: 0000000000000000 R09: 0000153f2c03c930
[  225.319105] R10: 0000000000000000 R11: 0000000000000293 R12: 0000153f2c037e70
[  225.319108] R13: 0000000000000020 R14: 0000000000000000 R15: 0000153f3c6bc4e8
[  225.319114]  </TASK>
[  225.319115] Modules linked in: xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfsmd_mod kvmgt mdev i915 iosf_mbi drm_buddy ttm drm_display_helper drm_kms_helper drm intel_gtt agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops efivarfs ip6tabl                                                                                         e_filter ip6_tables iptable_filter ip_tables x_tables af_packet 8021q garp mrp bridge stp llc bonding tls igb i2c_algo_bit e1000e x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel nvme i2c_i801 crypto_simd wmi_bmof cryptd rapl intel_cstate intel_uncore i2c_smbus nvme_core i2c_core input_leds led_class joydev ahci intel_pch_thermal libahci wmi video backlight button acpi_tad acpi_pad unix [last unloaded: i2                                                                                         c_algo_bit]
[  225.319205] CR2: 0000000000000098
[  225.319208] ---[ end trace 0000000000000000 ]---
[  225.934871] RIP: 0010:rwsem_write_trylock+0x7/0x23
[  225.934879] Code: cc cc cc 48 8b 47 08 a8 01 74 13 a8 02 75 0f 48 89 c2 48 83 ca 02 f0 48 0f b1 57 08 75 e9 c3 cc cc cc cc 31 c0 ba 01 00 00 00 <f0> 48 0f b1 17 0f 94 c0 75 0d 65 48 8b 14 25 c0 bb 01 00 48 89 57
[  225.934886] RSP: 0018:ffffc900019d3be8 EFLAGS: 00010246
[  225.934890] RAX: 0000000000000000 RBX: 0000000000000098 RCX: 0000000000000064
[  225.934893] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 0000000000000098
[  225.934896] RBP: ffffc900019d3c90 R08: ffff888154795180 R09: 0000000080150012
[  225.934899] R10: ffff888154795180 R11: 0000000000000000 R12: ffffffffa0956560
[  225.934902] R13: ffff8881056c5d80 R14: ffff8881056c5d80 R15: ffff888105073260
[  225.934905] FS:  0000153f3a7656c0(0000) GS:ffff88845e180000(0000) knlGS:0000000000000000
[  225.934909] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  225.934912] CR2: 0000000000000098 CR3: 00000001050d4006 CR4: 00000000003706e0
zhenyw commented 1 year ago

what's the step to trigger this? Looks you're unloading i915 driver? Does it happen on latest kernel?

Spongman commented 1 year ago

i don't know. i'm using the unraid GVT-g plugin. i don't think i can upgrade the kernel.