thesofproject / linux

Linux kernel source tree
Other
90 stars 129 forks source link

[BUG] call trace happens when testing kmod on TGL-IPC4 platforms #4545

Closed keqiaozhang closed 1 year ago

keqiaozhang commented 1 year ago

Describe the bug On TGL-IPC4 platforms, we observed call trace error during the kmod stress test. It happened on 2 platforms:TGLU_RVP_SDW-ipc4 and TGLU_UP_HDA-IPC4. The reproduction rate is high and should be a regression.

dmesg

[ 4899.494438] kernel: snd_sof_intel_hda_common:hda_init_caps: sof-audio-pci-intel-tgl 0000:00:1f.3: skipping SoundWire, no links enabled
[ 4899.494591] kernel: snd_sof_intel_hda:hda_codec_probe: sof-audio-pci-intel-tgl 0000:00:1f.3: HDA codec #0 probed OK: response: 10ec0888
[ 4899.499865] kernel: BUG: kernel NULL pointer dereference, address: 000000000000007c
[ 4899.499871] kernel: #PF: supervisor write access in kernel mode
[ 4899.499872] kernel: #PF: error_code(0x0002) - not-present page
[ 4899.499874] kernel: PGD 0 P4D 0 
[ 4899.499876] kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 4899.499878] kernel: CPU: 0 PID: 63589 Comm: systemd-udevd Not tainted 6.5.0-rc5-g1083fe342f46 #dev
[ 4899.499881] kernel: Hardware name: AAEON UPX-TGL01/UPX-TGL01, BIOS UXTGBM16 03/31/2022
[ 4899.499882] kernel: RIP: 0010:hdac_hda_dev_probe+0xd3/0x170 [snd_soc_hdac_hda]
[ 4899.499888] kernel: Code: c7 c6 a0 30 0e c1 e8 6c 0f aa ff 41 89 c5 85 c0 78 74 48 8b bd a8 04 00 00 4c 89 fe e8 a6 de fe ff f0 4c 0f ab 35 9d 26 00 00 <41> 89 5c 24 7c 5b 44 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc
[ 4899.499890] kernel: RSP: 0018:ffffad764231fbc8 EFLAGS: 00010282
[ 4899.499891] kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 4899.499893] kernel: RDX: 0000000000000000 RSI: 000000002abc2bc2 RDI: fffffffb0fbd4575
[ 4899.499894] kernel: RBP: ffff8fc5704d0000 R08: 00000474c0f2ecbe R09: 0000000000000000
[ 4899.499895] kernel: R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
[ 4899.499896] kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff8fc507a5bc00
[ 4899.499897] kernel: FS:  00007f5699a0f8c0(0000) GS:ffff8fc673800000(0000) knlGS:0000000000000000
[ 4899.499898] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4899.499899] kernel: CR2: 000000000000007c CR3: 000000017047e003 CR4: 0000000000f70ef0
[ 4899.499900] kernel: PKRU: 55555554
[ 4899.499901] kernel: Call Trace:
[ 4899.499903] kernel:  
[ 4899.499906] kernel:  ? __die+0x24/0x70
[ 4899.499911] kernel:  ? page_fault_oops+0x15b/0x440
[ 4899.499914] kernel:  ? snd_hdac_ext_bus_link_put+0x29/0xa0 [snd_hda_ext_core]
[ 4899.499919] kernel:  ? lock_acquired+0xb1/0x340
[ 4899.499923] kernel:  ? exc_page_fault+0x64/0x190
[ 4899.499926] kernel:  ? asm_exc_page_fault+0x26/0x30
[ 4899.499930] kernel:  ? hdac_hda_dev_probe+0xd3/0x170 [snd_soc_hdac_hda]
[ 4899.499933] kernel:  ? hdac_hda_dev_probe+0xca/0x170 [snd_soc_hdac_hda]
[ 4899.499937] kernel:  really_probe+0x1a2/0x410
[ 4899.499942] kernel:  __driver_probe_device+0x78/0x160
[ 4899.499944] kernel:  driver_probe_device+0x1e/0x90
[ 4899.499945] kernel:  __driver_attach+0xda/0x1d0
[ 4899.499947] kernel:  ? __pfx___driver_attach+0x10/0x10
[ 4899.499949] kernel:  bus_for_each_dev+0x7c/0xd0
[ 4899.499951] kernel:  bus_add_driver+0x119/0x220
[ 4899.499953] kernel:  driver_register+0x60/0x120
[ 4899.499955] kernel:  ? __pfx_realtek_driver_init+0x10/0x10 [snd_hda_codec_realtek]
[ 4899.499964] kernel:  do_one_initcall+0x5c/0x270
[ 4899.499968] kernel:  ? kmalloc_trace+0xa8/0xb0
[ 4899.499972] kernel:  do_init_module+0x64/0x230
[ 4899.499977] kernel:  init_module_from_file+0x8b/0xd0
[ 4899.499980] kernel:  idempotent_init_module+0x18d/0x240
[ 4899.499983] kernel:  __x64_sys_finit_module+0x5e/0xb0
[ 4899.499986] kernel:  do_syscall_64+0x3c/0x90
[ 4899.499989] kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 4899.499992] kernel: RIP: 0033:0x7f569a105a3d
[ 4899.499994] kernel: Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 a3 0f 00 f7 d8 64 89 01 48
[ 4899.499995] kernel: RSP: 002b:00007ffe436e7c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 4899.499997] kernel: RAX: ffffffffffffffda RBX: 000055bca9ba84a0 RCX: 00007f569a105a3d
[ 4899.499998] kernel: RDX: 0000000000000000 RSI: 00007f569a29e441 RDI: 000000000000000f
[ 4899.499999] kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
[ 4899.500000] kernel: R10: 000000000000000f R11: 0000000000000246 R12: 00007f569a29e441
[ 4899.500001] kernel: R13: 000055bca9adb930 R14: 000055bca9ad4030 R15: 000055bca9ba5ed0
[ 4899.500003] kernel:  
[ 4899.500004] kernel: Modules linked in: snd_hda_codec_realtek(+) snd_hda_codec_generic snd_sof_pci_intel_mtl snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_pci_intel_icl snd_sof_pci_intel_cnl snd_sof_pci_intel_apl snd_sof_pci_intel_skl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_soc_hdac_hda snd_hda_ext_core snd_hda_codec snd_hwdep snd_hda_core snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_bdw snd_sof_acpi_intel_byt snd_sof_intel_atom snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_sof_acpi snd_sof snd_sof_utils snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_es8326 snd_soc_es8316 snd_soc_max98390 snd_soc_max98373_i2c snd_soc_max98373_sdw snd_soc_max98373 snd_soc_max98363 snd_soc_max98357a snd_soc_ts3a227e snd_soc_max98090 snd_soc_rt5682_sdw snd_soc_rt5682_i2c snd_soc_rt5682 snd_soc_rt5677 snd_soc_rt5677_spi snd_soc_rt5670 snd_soc_rt5660 snd_soc_rt5651 snd_soc_rt5645 snd_soc_rt5640 snd_soc_rt1011 snd_soc_sdw_mockup
[ 4899.500029] kernel:  snd_soc_rt1318_sdw snd_soc_rt1316_sdw snd_soc_rt1308_sdw snd_soc_rt1308 snd_soc_rl6231 snd_soc_rt715_sdca snd_soc_rt715 snd_soc_rt712_sdca_dmic snd_soc_rt712_sdca snd_soc_rt711_sdca regmap_sdw_mbq snd_soc_rt711 snd_soc_rt700 regmap_sdw soundwire_bus snd_soc_rt298 snd_soc_rt286 snd_soc_rt274 snd_soc_rl6347a snd_soc_wm8804_i2c snd_soc_wm8804 snd_soc_pcm512x_i2c snd_soc_pcm512x snd_soc_da7219 snd_soc_da7213 snd_soc_core snd_compress snd_pcm regmap_i2c ledtrig_audio squashfs snd_usbmidi_lib snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer wmi_bmof intel_rapl_common i915 x86_pkg_temp_thermal snd i2c_algo_bit intel_powerclamp soundcore drm_buddy mei_me mei video ttm drm_display_helper drm_kms_helper wmi intel_pmc_core drm fuse efivarfs spi_pxa2xx_platform e1000e xhci_pci intel_lpss_pci intel_lpss idma64 mfd_core xhci_hcd [last unloaded: snd_pcm]
[ 4899.500062] kernel: CR2: 000000000000007c
[ 4899.500064] kernel: ---[ end trace 0000000000000000 ]---
[ 4899.500065] kernel: RIP: 0010:hdac_hda_dev_probe+0xd3/0x170 [snd_soc_hdac_hda]
[ 4899.500069] kernel: Code: c7 c6 a0 30 0e c1 e8 6c 0f aa ff 41 89 c5 85 c0 78 74 48 8b bd a8 04 00 00 4c 89 fe e8 a6 de fe ff f0 4c 0f ab 35 9d 26 00 00 <41> 89 5c 24 7c 5b 44 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc
[ 4899.500070] kernel: RSP: 0018:ffffad764231fbc8 EFLAGS: 00010282
[ 4899.500071] kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 4899.500072] kernel: RDX: 0000000000000000 RSI: 000000002abc2bc2 RDI: fffffffb0fbd4575
[ 4899.500073] kernel: RBP: ffff8fc5704d0000 R08: 00000474c0f2ecbe R09: 0000000000000000
[ 4899.500074] kernel: R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
[ 4899.500075] kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff8fc507a5bc00
[ 4899.500076] kernel: FS:  00007f5699a0f8c0(0000) GS:ffff8fc673800000(0000) knlGS:0000000000000000
[ 4899.500077] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4899.500078] kernel: CR2: 000000000000007c CR3: 000000017047e003 CR4: 0000000000f70ef0
[ 4899.500079] kernel: PKRU: 55555554
[ 4899.500080] kernel: note: systemd-udevd[63589] exited with irqs disabled
[ 4899.501456] kernel: snd_sof_intel_hda:request_codec_module: snd_hda_codec_realtek ehdaudio0D0: loading codec module: hdaudio:v10EC0888r00100302a01
[ 4899.543126] kernel: usbcore: registered new interface driver snd-usb-audio

To Reproduce ~/sof-test/test-case/check-kmod-load-unload.sh -l 25

Reproduction Rate almost 100%

Environment 1) Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).

2) Name of the platform(s) on which the bug is observed.

dmesg.txt

keqiaozhang commented 1 year ago

This regression is caused by https://github.com/thesofproject/linux/commit/3f9245260d0ae74031839a925c7965ac710fe459.

bardliao commented 1 year ago

@keqiaozhang Could you check if https://github.com/thesofproject/linux/pull/4548 fixes the issue?

keqiaozhang commented 1 year ago

This bug can not be reproduced in CI, closing it.