thesofproject / sof

Sound Open Firmware
Other
570 stars 322 forks source link

[BUG] How can I recover from DSP Panic? #3276

Closed EvanCarroll closed 4 years ago

EvanCarroll commented 4 years ago

Describe the bug User gets a DSP, not clear how to recover.

To Reproduce Happens sporatically

Reproduction Rate About once in an hour of use.

Expected behavior A clear and concise description of what you expected to happen.

Impact What impact does this issue have on your progress (e.g., annoyance, showstopper)

I am getting a dsp panic,

error : DSP panic!
status: fw entered - code 00000005
error: can't enter idle
error: trace point 00004000
error: panic at src/lib/agent.c:62

Then sound stops working and I can't get it back on without restarting. Is there any way to recover from this?

ALSA-INFO output: https://alsa-project.org/db/?f=d6ec2e739dcaa79603877df308a5912269d14995

Output of sudo dmesg | egrep 'sof|audio'

plbossart commented 4 years ago

@EvanCarroll does this happen when you are continuously playing sound? Or just when a sound gets played after some period of idleness? Just trying to correlate with other cases, we've seen this panic in other bug reports but haven't triangulated what might cause this. Can you also point us to the results of alsa-info so that we know what platform this happens on? @lgirdwood FYI

plbossart commented 4 years ago

Possible duplicate of https://github.com/thesofproject/sof/issues/2828 ?

paulstelian97 commented 4 years ago

If SOF is built as a kernel module, removing and reinserting the module should also reset the DSP and get sound back on, for a workaround. No guarantees that it works (for some reason for i.MX there are issues in this flow?)

lgirdwood commented 4 years ago

@slawblauciak @mengdonglin fyi.

ilia-kats commented 4 years ago

I'm also getting hit by this. I usually notice it when trying to use audio (playback or mic) after several hours of not using audio, but today this happened right in the middle of playback. The sound suddenly just went silent and this was in the kernel log:

[263553.357824] sof-audio-pci 0000:00:1f.3: error : DSP panic!
[263553.357837] sof-audio-pci 0000:00:1f.3: status: fw entered - code 00000005
[263553.358092] sof-audio-pci 0000:00:1f.3: error: can't enter idle
[263553.358096] sof-audio-pci 0000:00:1f.3: error: trace point 00004000
[263553.358100] sof-audio-pci 0000:00:1f.3: error: panic at src/lib/agent.c:62
[263553.358104] sof-audio-pci 0000:00:1f.3: error: DSP Firmware Oops
[263553.358109] sof-audio-pci 0000:00:1f.3: EXCCAUSE 0x0000003f EXCVADDR 0x00000000 PS       0x00060725 SAR     0x00000000
[263553.358113] sof-audio-pci 0000:00:1f.3: EPC1     0x00000000 EPC2     0xbe00d1fe EPC3     0x00000000 EPC4    0x00000000
[263553.358117] sof-audio-pci 0000:00:1f.3: EPC5     0x00000000 EPC6     0x00000000 EPC7     0x00000000 DEPC    0x00000000
[263553.358121] sof-audio-pci 0000:00:1f.3: EPS2     0x00060d20 EPS3     0x00000000 EPS4     0x00000000 EPS5    0x00000000
[263553.358124] sof-audio-pci 0000:00:1f.3: EPS6     0x00000000 EPS7     0x00000000 INTENABL 0x00000000 INTERRU 0x00000222
[263553.358127] sof-audio-pci 0000:00:1f.3: stack dump from 0xbe05a110
[263553.358135] sof-audio-pci 0000:00:1f.3: 0xbe05a110: be05a140 00000001 be013760 00000001
[263553.358140] sof-audio-pci 0000:00:1f.3: 0xbe05a114: 00000000 00000000 0000003e be064400
[263553.358145] sof-audio-pci 0000:00:1f.3: 0xbe05a118: b1712600 c11fd8dd 000c0800 00000000
[263553.358149] sof-audio-pci 0000:00:1f.3: 0xbe05a11c: 0dead000 00000000 e14c6018 ffff9e17
[263553.358154] sof-audio-pci 0000:00:1f.3: 0xbe05a120: c1383044 ffffffff e41a3180 ffff9e17
[263553.358158] sof-audio-pci 0000:00:1f.3: 0xbe05a124: b1712600 c11fd8dd 00000000 00000000
[263553.358163] sof-audio-pci 0000:00:1f.3: 0xbe05a128: 004f7c48 ffffaac0 e41a3180 ffff9e17
[263553.358167] sof-audio-pci 0000:00:1f.3: 0xbe05a12c: 00000000 00000000 00000000 00000000

This is on a Thinkpad X1 Carbon 7th using SOF firmware 1.5.1 on Arch linux. Here is alsa-info output:

alsa-info.log

paulstelian97 commented 4 years ago

@ilia-kats I recommend that you move those very large logs into separate files and attach them to your comment (yeah, somehow it is possible to do that). Also, I recommend you open another issue with your specific panic details, so it can be checked. I have edited your comment to keep it in check.

Yikes, another agent panic (DSP load got too high and the agent, a component that keeps the load in check, saw that the DSP was overloaded and stopped everything).

EvanCarroll commented 4 years ago

I've updated with the result of alsa-info. However, I'm more concerned about knowing if recovery is possible than fixing any single issue. This issue however also happens on the X1 Carbon 7th Gen.

plbossart commented 4 years ago

@EvanCarroll In theory if you do a suspend operation the DSP context is lost and the firmware re-downloaded. We do not have a recovery in place at the moment, it's been an ask for ages but we never got to it. https://github.com/thesofproject/linux/issues/452

paulstelian97 commented 4 years ago

@tlauda Are you aware of any DSP clock speed woes on KBL? Maybe that's why the agent is crying for these two and causing panics...

@EvanCarroll @ilia-kats I'd like some kernel boot logs (dmesg) from both of you to identify the exact system, topology, firmware version etc. It doesn't need to be the very log from the crashes (although it's always better), but they must have the same version (no updates or anything) as with the crash so we can identify it. There are several already known issues with the agent in older versions of the firmware (as far as I know, they've been patched in the current development version) and I'd like to know that it isn't one of those that you're hitting.

EvanCarroll commented 4 years ago

I'll provide that next time it crashes, I can't guarantee my firmware hasn't already updated. I use fwupdagent.

paulstelian97 commented 4 years ago

Sure thing. Maybe the update already fixed the issue though. But if it didn't, you're welcome to post all the logs so that I (or, more importantly, the devs that know your platform specifically; I don't know more than some generalities about the Intel platforms) can look into it and identify the reason for the crash. Again, with exact information we can provide a proper solution or workaround. Without it, all we can say is "try doing as root rmmod snd-soc-sof; modprobe snd-soc-sof, and if that fails restart the machine".

EvanCarroll commented 4 years ago

It just did it again. I was watching a movie and xscreensaver kicked in.

I will update the topic with the logs.

paulstelian97 commented 4 years ago

The only thing that is half-suspect is "FW ABI is more recent than kernel/topology ABI is more recent than kernel" (kernel is 3:13:0 while FW/topology are 3:16:0). See if you can somehow update the SOF kernel module (if you have a module) or the kernel itself (if it's built in). Not sure if it will actually help but I'd say it's worth a shot.

slawblauciak commented 4 years ago

The FW should no longer panic like that in the upcoming 1.6 release.

mengdonglin commented 4 years ago

Close the bug now. Recovery solution from DSP panic is a big topic and will not be tracked in this bug.