thesofproject / linux

Linux kernel source tree
Other
91 stars 133 forks source link

[DNM] Random attempts to fix controller reset issue #4894

Closed plbossart closed 8 months ago

plbossart commented 8 months ago

suggested hacks based on code review only

I don't see any improvement on the hardware but the code does look suspicious.

with the 3rd commit I can't reproduce the errors reported in issue #4889 root-caused to the use of PIO commands (see PR #4892)

cc:

plbossart commented 8 months ago

@bardliao can you take a look? The coincidence with the WAKE_STS also used for SoundWire is troubling.

@ujfalusi comments/test welcome. I just randomly looked at potential issues and found this 3rd commit by comparison with the existing programming sequences in sound/hda.

plbossart commented 8 months ago

FWIW the 3rd commit partially reverts https://github.com/thesofproject/linux/commit/6ec3295826d8c11141c970cf3fd68d77d3452adb ("ASoC: SOF: Intel: hda: remove duplicated clear WAKESTS")

I have no idea why this was removed.

It could also be that there's an interference from https://github.com/thesofproject/linux/commit/ae5ff22a884d2aac0908cfd0113d3db633d79ae6 ("ASoC: SOF: Intel: hda-ctrl: only clear WAKESTS for HDaudio codecs"), where in the end we didn't clear all the WAKE_STS fields.

Wow, my brain is officially fried haha.

plbossart commented 8 months ago

Wait, this gets even better.

This WAKE_STS clear was improved in b37a15188eae9d4c49c5bb035e0c8d4058e4d9b3 ("ALSA: hda: avoid write to STATESTS if controller is in reset") by our very own @kv2019i - but we never used this in the SOF driver.

Meh.

plbossart commented 8 months ago

And for reference, the original clear was added in 2007 by someone using an "obiwan" email. Who said we can't have fun at work, eh?

commit e8a7f136f5edb6ae83b14faaa0da2a3c4558f431
Author: Danny Tholen <obiwan@mailmij.org>
Date:   Tue Sep 11 21:41:56 2007 +0200

    [ALSA] hda-intel - Improve HD-audio codec probing robustness

    When modem is disabled in the BIOS, detection of the number of codecs
    always fails after booting if STATESTS is not cleared first.
    This patch fixes this problem and also adds an error check in a place
    where a read error would lead to a very large number of pointless loops.

    Signed-off-by: Danny Tholen <obiwan@mailmij.org>
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Jaroslav Kysela <perex@suse.cz>
marc-hb commented 8 months ago

And for reference, the original clear was added in 2007 by someone using an "obiwan" email. Who said we can't have fun at work, eh?

This could have been worse, it could have been "JiaT75"

ujfalusi commented 8 months ago

I'm puzzled, SDW should not be using CORB and we don't have display fro LNL yet (afaik). HDA setups are clear, only SDW is failing. How come that the PIO mode triggered this mass fail with SDW and a seemingly unrelated (to PIO mode) patch fixes the regression?

plbossart commented 8 months ago

I'm puzzled, SDW should not be using CORB and we don't have display fro LNL yet (afaik). HDA setups are clear, only SDW is failing. How come that the PIO mode triggered this mass fail with SDW and a seemingly unrelated (to PIO mode) patch fixes the regression?

There is a clear correlation between SoundWire and WAKE_STS. From LNL onwards, we use the HDaudio wake detection instead of the SoundWire wake detection. So clearing all the bits prior to entering reset makes sense, otherwise there might be some bitfields that are left active.

But yeah I don't have any explanation for the PIO role in triggering this. We may need to double check what happens if we don't enable PIO for LNL.

ujfalusi commented 8 months ago

@plbossart, we kind of know what happens if we don't enable PIO mode for LNL...

plbossart commented 8 months ago

@plbossart, we kind of know what happens if we don't enable PIO mode for LNL...

I meant do we see the reset issue without the PIO mode, and you've replied that it existed before and we overlooked it.

ujfalusi commented 8 months ago

@plbossart, we kind of know what happens if we don't enable PIO mode for LNL...

I meant do we see the reset issue without the PIO mode, and you've replied that it existed before and we overlooked it.

I don't have access to the entire dmesg, but daily test 38398 (from 29.02.2024) does have these:

[ 4399.067924] kernel: soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[ 4399.067964] kernel: soundwire sdw-master-0-0: trf on Slave 6 failed:-5 read addr 8038 count 0
[ 4399.068012] kernel: rt711-sdca sdw:0:0:025d:0711:01: Failed to get private value: 6100038 => ffff92fd ret=-5
[ 4399.571720] kernel: soundwire_intel soundwire_intel.link.3: IO transfer timed out, cmd 3 device 1 addr 45 len 1
[ 4399.571785] kernel: soundwire sdw-master-0-3: trf on Slave 1 failed:-110 write addr 45 count 0
[ 4399.571810] kernel: rt1316-sdca sdw:0:3:025d:1316:01: SDW_SCP_SYSTEMCTRL write failed:-110
[ 4399.571826] kernel: rt1316-sdca sdw:0:3:025d:1316:01: clock stop prepare failed:-110
[ 4399.571839] kernel: soundwire_intel soundwire_intel.link.3: prepare clock stop failed -110
[ 4399.571852] kernel: soundwire_intel soundwire_intel.link.3: intel_stop_bus: cannot stop clock: -110
[ 4399.571921] kernel: soundwire_intel soundwire_intel.link.1: IO transfer timed out, cmd 3 device 7 addr 45 len 1
[ 4399.571942] kernel: soundwire sdw-master-0-1: trf on Slave 7 failed:-110 write addr 45 count 0
[ 4399.571958] kernel: rt715-sdca sdw:0:1:025d:0714:01: SDW_SCP_SYSTEMCTRL write failed:-110
[ 4399.571971] kernel: rt715-sdca sdw:0:1:025d:0714:01: clock stop prepare failed:-110
[ 4399.571983] kernel: soundwire_intel soundwire_intel.link.1: prepare clock stop failed -110
[ 4399.571994] kernel: soundwire_intel soundwire_intel.link.1: intel_stop_bus: cannot stop clock: -110
[ 4399.572048] kernel: soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out

I don't see gctl