thesofproject / sof

Sound Open Firmware
Other
562 stars 318 forks source link

[BUG] driver crash after hibernation on kernel 6.11.0-8 #9572

Closed md0-code closed 4 weeks ago

md0-code commented 1 month ago

Describe the bug Upon returning from hibernation on kernel 6.11.0.-8 the sof-audio-pci-intel-mtl driver crashes. Does not happen with kernel version 6.8.0-45.

To Reproduce

Reproduction Rate 1/1

Expected behavior Driver should survive hibernation

Impact Major - no audio available after resuming from hibernation

Environment OS: Ubuntu oracular 24.10 x86_64 Host: ASUS Zenbook 14 UX3405MA_Q415MA (1.0) Kernel: Linux 6.11.0-8-generic SOF: 2024-09

[    2.477315] sof-audio-pci-intel-mtl 0000:00:1f.3: enabling device (0000 -> 0002)
[    2.477547] sof-audio-pci-intel-mtl 0000:00:1f.3: DSP detected with PCI class/subclass/prog-if 0x040100
[    2.572705] sof-audio-pci-intel-mtl 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
[    2.580124] sof-audio-pci-intel-mtl 0000:00:1f.3: use msi interrupt mode
[    2.601737] sof-audio-pci-intel-mtl 0000:00:1f.3: hda codecs found, mask 5
[    2.601744] sof-audio-pci-intel-mtl 0000:00:1f.3: using HDA machine driver skl_hda_dsp_generic now
[    2.601748] sof-audio-pci-intel-mtl 0000:00:1f.3: DMICs detected in NHLT tables: 2
[    2.607251] sof-audio-pci-intel-mtl 0000:00:1f.3: Firmware paths/files for ipc type 1:
[    2.607256] sof-audio-pci-intel-mtl 0000:00:1f.3:  Firmware file:     intel/sof-ipc4/mtl/sof-mtl.ri
[    2.607257] sof-audio-pci-intel-mtl 0000:00:1f.3:  Firmware lib path: intel/sof-ipc4-lib/mtl
[    2.607258] sof-audio-pci-intel-mtl 0000:00:1f.3:  Topology file:     intel/sof-ace-tplg/sof-hda-generic-2ch.tplg
[    2.608181] sof-audio-pci-intel-mtl 0000:00:1f.3: Loaded firmware library: ADSPFW, version: 2.11.1.1
[    2.728728] sof-audio-pci-intel-mtl 0000:00:1f.3: Booted firmware version: 2.11.1.1
[    2.751069] sof-audio-pci-intel-mtl 0000:00:1f.3: Topology: ABI 3:29:1 Kernel ABI 3:23:1

Screenshots or console output Relevant crash log:

[   53.320065] sof-audio-pci-intel-mtl 0000:00:1f.3: Code loader DMA did not complete
[   53.320078] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump start ]------------
[   53.320079] sof-audio-pci-intel-mtl 0000:00:1f.3: Firmware download failed
[   53.320081] sof-audio-pci-intel-mtl 0000:00:1f.3: fw_state: SOF_FW_BOOT_READY_OK (6)
[   53.320105] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x50000005: module: ROM_EXT, state: FW_ENTERED, running
[   53.320113] sof-audio-pci-intel-mtl 0000:00:1f.3: Firmware state: 0x5, status/error code: 0x0
[   53.320129] sof-audio-pci-intel-mtl 0000:00:1f.3: Core dump is not available due to invalid separator 0xc0de
[   53.320132] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump end ]------------
[   53.320203] sof-audio-pci-intel-mtl 0000:00:1f.3: Failed to start DSP
[   53.320205] sof-audio-pci-intel-mtl 0000:00:1f.3: error: failed to boot DSP firmware after resume -110
[   53.320208] sof-audio-pci-intel-mtl 0000:00:1f.3: PM: dpm_run_callback(): pci_pm_restore returns -110
[   53.320226] sof-audio-pci-intel-mtl 0000:00:1f.3: PM: failed to restore async: error -110
[   55.098996] sof-audio-pci-intel-mtl 0000:00:1f.3: failed to disable Host IPC and/or SOUNDWIRE
[   55.099007] sof-audio-pci-intel-mtl 0000:00:1f.3: error: failed to power down DSP during suspend -110
[   55.099011] sof-audio-pci-intel-mtl 0000:00:1f.3: can't suspend (snd_sof_runtime_suspend [snd_sof] returned -110)
lgirdwood commented 4 weeks ago

@ujfalusi did we have a recent fix around code loader that can be tried by @md0-code

ujfalusi commented 4 weeks ago

I think the fix for this is sent to upstream and will be back ported to 6.11 as soon as it hits 6.12: https://lore.kernel.org/linux-sound/20241008060710.15409-1-peter.ujfalusi@linux.intel.com/

Original issue: https://github.com/thesofproject/linux/issues/5135

md0-code commented 4 weeks ago

@ujfalusi Any relatively easy way to test this fix now? I don't have too much experience with kernel patches...

ujfalusi commented 4 weeks ago

@md0-code, you would need to build your own kernel, if you are not comfortable doing that then I would wait for the patch to be backported to stable. Meanwhile disable the hibernate and use suspend, I believe that is working fine for you?

The official guide for ubuntu is: https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel You would need to apply the patch before building the kernel.

The other way would be to use our development tree and test it. You need the packages to be able to build the kernel

sudo apt install build-dep git libncurses-dev gawk flex bison openssl libssl-dev dkms libelf-dev libudev-dev libpci-dev libiberty-dev autoconf llvm

(I'm not sure what you actually need on ubuntu) then:

mkdir ~/kernel_test && cd ~/kernel_test
git clone https://github.com/ujfalusi/sof-kconfig.git
git clone https://github.com/thesofproject/linux.git
cd linux
../sof-kconfig/kconfig-sof-default.sh
make -j `nproc` bindeb-pkg
sudo dpkg -i ../linux-image--6.12.0-rc2*

Then reboot to use this kernel:

awk -F\' '/menuentry / {print $2}' /boot/grub/grub.cfg
# observe the list and note the one which is something "Ubuntu, with Linux 6.12.0-rc2-g2e6038a1d377"
sudo grub-reboot "the string you have noted above"
# reboot the laptop

Check if you really booted the new kernel:

uname -a

test the hibernate, then you can remove the kernel to use the distro one:

apt list --installed | grep linux-image
# note again the SOF kernel line and then
dpkg -r linux-image-6.12.0-rc2-g2e6038a1d377
# reboot

I'm not sure if ubuntu uses Grub, if not then this might not work, so, I would only try it if I would know what I'm doing..

md0-code commented 4 weeks ago

@ujfalusi Thank you for your instructions. They were helpful, and I was able to compile the kernel with the suggested patch. However, the problem still persists - I continue to encounter the same errors when resuming from hibernate. Is it possible that this issue has a different cause? Please let me know if you need any additional information to help diagnose the problem.

ujfalusi commented 4 weeks ago

@md0-code, oh, I'm really surprised that the patch did not helped! Can you copy sof-dyndbg.conf.txt as /etc/modprobe.d/sof-dyndbg.conf, reboot and attach the full kernel log with the kernel that has the patch applied? Which contains one hibernate cycle

md0-code commented 4 weeks ago

Here it is, a hibernate cycle with a patched 6.12.0-rc3+ kernel: kernel_log-mtl_hibernate_problem.txt

ujfalusi commented 4 weeks ago

@md0-code, I see:

[   38.516640] sof-audio-pci-intel-mtl 0000:00:1f.3: Code loader DMA did not complete
[   38.516652] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump start ]------------
[   38.516655] sof-audio-pci-intel-mtl 0000:00:1f.3: Firmware download failed
[   38.516657] sof-audio-pci-intel-mtl 0000:00:1f.3: fw_state: SOF_FW_BOOT_READY_OK (6)
[   38.516682] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x50000005: module: ROM_EXT, state: FW_ENTERED, running
[   38.516690] sof-audio-pci-intel-mtl 0000:00:1f.3: Firmware state: 0x5, status/error code: 0x0
[   38.516705] sof-audio-pci-intel-mtl 0000:00:1f.3: Core dump is not available due to invalid separator 0xc0de
[   38.516709] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump end ]------------
[   38.516782] sof-audio-pci-intel-mtl 0000:00:1f.3: Failed to start DSP
[   38.516785] sof-audio-pci-intel-mtl 0000:00:1f.3: error: failed to boot DSP firmware after resume -110

Which is not right since the patch [1] removed the Code loader DMA did not complete print from the kernel, so this kernel does not have the patch applied..

[1] https://lore.kernel.org/linux-sound/20241008060710.15409-1-peter.ujfalusi@linux.intel.com/

ujfalusi commented 4 weeks ago

@md0-code, and this is indeed the issue that the patch is fixing:

[   38.191728] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc rx      : 0x1b080000|0x0
[   38.191740] snd_sof:sof_set_fw_state: sof-audio-pci-intel-mtl 0000:00:1f.3: fw_state change: 3 -> 6
[   38.191744] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc rx done : 0x1b080000|0x0

this is the FW_READY notification from firmware that it has booted up.

md0-code commented 4 weeks ago

Thank you for the confirmation. Most probably I did something wrong then when compiling the kernel - I saved your patch as patch.diff then did a patch -p1 < patch.diff against the latest mainline 6.12.0-rc3 before issuing a make -jnprocbindeb-pkg. I even checked to see that the hda-loader.c file was actually modified before installing the resulting deb file... I obviously need to learn more about the whole kernel patching procedure before doing this kind of thing...

ujfalusi commented 4 weeks ago

If you cloned the tree with git, then you can download the patch and git am <the.patch>, but the patch -p1 < patch.diff should work as well. Are you sure that the modified kernel is installed and is the one that booted up?

ujfalusi commented 4 weeks ago

If in doubt, do a make menuconfig navigate to General setup press ENTER at Local version - append to kernel release and enter something like -bugtesting, then exit and save the config and re-build the kernel. Note also that there will be multiple debian packages and you need to install the latest.

md0-code commented 4 weeks ago

Yes, I did a git clone git://git.launchpad.net/~ubuntu-kernel-test/ubuntu/+source/linux/+git/mainline-crack cod/mainline/v6.12-rc3, make menuconfig, patch -p1 < patch.diff and make -j \'nproc\' bindeb-pkg, and then dpkg -t <linux-image*.deb>

uname -a returns now _Linux laptop 6.12.0-rc3+ #2 SMP PREEMPT_DYNAMIC Wed Oct 16 14:32:07 UTC 2024 x86_64 x86_64 x8664 GNU/Linux

I'll try your suggestion and see if it makes any difference.

ujfalusi commented 4 weeks ago

you can check with git diff if the patch did modified the kernel, the git am will apply the patch cleanly, you can check the topmost commit with git show (but you can only apply to a clean tree, so before git am you need to do a git reset --hard).

I'm not sure what goes wrong, it looks like that you done everything correctly, the computer just says no ;)

md0-code commented 4 weeks ago

Found the problem! I was using a wildcard to copy the resulting .deb file and forgot that I still had an unpatched version in the output folder which always cam up first :) My bad :) Can confirm that the patch works and I can finally hibernate my laptop! Thank you again @ujfalusi