Closed plbossart closed 8 months ago
@bardliao can you take a look? The coincidence with the WAKE_STS also used for SoundWire is troubling.
@ujfalusi comments/test welcome. I just randomly looked at potential issues and found this 3rd commit by comparison with the existing programming sequences in sound/hda.
FWIW the 3rd commit partially reverts https://github.com/thesofproject/linux/commit/6ec3295826d8c11141c970cf3fd68d77d3452adb ("ASoC: SOF: Intel: hda: remove duplicated clear WAKESTS")
I have no idea why this was removed.
It could also be that there's an interference from https://github.com/thesofproject/linux/commit/ae5ff22a884d2aac0908cfd0113d3db633d79ae6 ("ASoC: SOF: Intel: hda-ctrl: only clear WAKESTS for HDaudio codecs"), where in the end we didn't clear all the WAKE_STS fields.
Wow, my brain is officially fried haha.
Wait, this gets even better.
This WAKE_STS clear was improved in b37a15188eae9d4c49c5bb035e0c8d4058e4d9b3 ("ALSA: hda: avoid write to STATESTS if controller is in reset") by our very own @kv2019i - but we never used this in the SOF driver.
Meh.
And for reference, the original clear was added in 2007 by someone using an "obiwan" email. Who said we can't have fun at work, eh?
commit e8a7f136f5edb6ae83b14faaa0da2a3c4558f431
Author: Danny Tholen <obiwan@mailmij.org>
Date: Tue Sep 11 21:41:56 2007 +0200
[ALSA] hda-intel - Improve HD-audio codec probing robustness
When modem is disabled in the BIOS, detection of the number of codecs
always fails after booting if STATESTS is not cleared first.
This patch fixes this problem and also adds an error check in a place
where a read error would lead to a very large number of pointless loops.
Signed-off-by: Danny Tholen <obiwan@mailmij.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
And for reference, the original clear was added in 2007 by someone using an "obiwan" email. Who said we can't have fun at work, eh?
This could have been worse, it could have been "JiaT75"
I'm puzzled, SDW should not be using CORB and we don't have display fro LNL yet (afaik). HDA setups are clear, only SDW is failing. How come that the PIO mode triggered this mass fail with SDW and a seemingly unrelated (to PIO mode) patch fixes the regression?
I'm puzzled, SDW should not be using CORB and we don't have display fro LNL yet (afaik). HDA setups are clear, only SDW is failing. How come that the PIO mode triggered this mass fail with SDW and a seemingly unrelated (to PIO mode) patch fixes the regression?
There is a clear correlation between SoundWire and WAKE_STS. From LNL onwards, we use the HDaudio wake detection instead of the SoundWire wake detection. So clearing all the bits prior to entering reset makes sense, otherwise there might be some bitfields that are left active.
But yeah I don't have any explanation for the PIO role in triggering this. We may need to double check what happens if we don't enable PIO for LNL.
@plbossart, we kind of know what happens if we don't enable PIO mode for LNL...
@plbossart, we kind of know what happens if we don't enable PIO mode for LNL...
I meant do we see the reset issue without the PIO mode, and you've replied that it existed before and we overlooked it.
@plbossart, we kind of know what happens if we don't enable PIO mode for LNL...
I meant do we see the reset issue without the PIO mode, and you've replied that it existed before and we overlooked it.
I don't have access to the entire dmesg, but daily test 38398 (from 29.02.2024) does have these:
[ 4399.067924] kernel: soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
[ 4399.067964] kernel: soundwire sdw-master-0-0: trf on Slave 6 failed:-5 read addr 8038 count 0
[ 4399.068012] kernel: rt711-sdca sdw:0:0:025d:0711:01: Failed to get private value: 6100038 => ffff92fd ret=-5
[ 4399.571720] kernel: soundwire_intel soundwire_intel.link.3: IO transfer timed out, cmd 3 device 1 addr 45 len 1
[ 4399.571785] kernel: soundwire sdw-master-0-3: trf on Slave 1 failed:-110 write addr 45 count 0
[ 4399.571810] kernel: rt1316-sdca sdw:0:3:025d:1316:01: SDW_SCP_SYSTEMCTRL write failed:-110
[ 4399.571826] kernel: rt1316-sdca sdw:0:3:025d:1316:01: clock stop prepare failed:-110
[ 4399.571839] kernel: soundwire_intel soundwire_intel.link.3: prepare clock stop failed -110
[ 4399.571852] kernel: soundwire_intel soundwire_intel.link.3: intel_stop_bus: cannot stop clock: -110
[ 4399.571921] kernel: soundwire_intel soundwire_intel.link.1: IO transfer timed out, cmd 3 device 7 addr 45 len 1
[ 4399.571942] kernel: soundwire sdw-master-0-1: trf on Slave 7 failed:-110 write addr 45 count 0
[ 4399.571958] kernel: rt715-sdca sdw:0:1:025d:0714:01: SDW_SCP_SYSTEMCTRL write failed:-110
[ 4399.571971] kernel: rt715-sdca sdw:0:1:025d:0714:01: clock stop prepare failed:-110
[ 4399.571983] kernel: soundwire_intel soundwire_intel.link.1: prepare clock stop failed -110
[ 4399.571994] kernel: soundwire_intel soundwire_intel.link.1: intel_stop_bus: cannot stop clock: -110
[ 4399.572048] kernel: soundwire_intel soundwire_intel.link.0: SCP Msg trf timed out
I don't see gctl
suggested hacks based on code review only
I don't see any improvement on the hardware but the code does look suspicious.with the 3rd commit I can't reproduce the errors reported in issue #4889 root-caused to the use of PIO commands (see PR #4892)
cc:
4896