Open marc-hb opened 1 month ago
The issue only happens on ba-lnlm-rvp-sdw-01
, but not jf-lnlm-rvp-sdw-1
. And I checked the dmesg found that
[ 8.481076] snd_soc_sof_sdw:sof_card_dai_links_create: sof_sdw sof_sdw: sdw 5, ssp 0, dmic 2, hdmi 0, bt: 0
in ba-lnlm-rvp-sdw-01
.
We have rt714 working as DMIC, so we need to disable the PCH DMIC from BIOS.
I added options snd_sof_intel_hda_generic dmic_num=0
as a temporary solution.
ba-mtlp-sdw-aioc-03 also recovred with BIOS settings change. can you please close this issue
Thanks @bardliao!
dmic 2
We have rt714 working as DMIC, so we need to disable the PCH DMIC from BIOS.
Can you please elaborate on the limitation? In any case I would expect one of the DMICs to be dropped but not the entire firmware!
Was this triggered by this "no functional changes" PR #5037 ?
cc: @brentlu
ba-mtlp-sdw-aioc-03 also recovred with BIOS settings change. can you please close this issue
Let's first get clarity on whether this is a workaround or fix. Configuring all devices is not convenient and the price of such minor misconfiguration seems way too high.
Thanks @bardliao!
dmic 2
We have rt714 working as DMIC, so we need to disable the PCH DMIC from BIOS.
- Can you please elaborate on the limitation? In any case I would expect one of the DMICs to be dropped but not the entire firmware!
- Was this triggered by this "no functional changes" PR Refactoring topology name fixup of Intel mach #5037 ?
cc: @brentlu
ba-mtlp-sdw-aioc-03 also recovred with BIOS settings change. can you please close this issue
Let's first get clarity on whether this is a workaround or fix. Configuring all devices is not convenient and the price of such minor misconfiguration seems way too high.
Yes it's the side effect of "ASoC: SOF: Intel: hda: refactoring topology name fixup for SDW mach". In original design dmic_num is reported only when mach->link_mask <= 2 bits. Now it's always reported to machine driver.
Please check if following PR could solve this issue and sorry for the regression. https://github.com/thesofproject/linux/pull/5125
sorry for the regression.
Not at all, this is exactly how Continous Integration is supposed to work: catching regressions immediately! https://www.bing.com/images/search?q=relative+cost+of+defects If tests were never failing then they would not be trying hard enough.
Thanks for the super quick fixup you already submitted in #5125 = also how Continuous Integration is supposed to work!
Should @ssavati re-enable the DMICs in the BIOS for testing #5125? I still don't really understand what the technical problem is... is there some documentation we can look at? I mean anything higher level and faster to read than the source code :-)
The failure is NOT deterministic, for instance these did not crash with the same commits...
The issue only happens on ba-lnlm-rvp-sdw-01, but not jf-lnlm-rvp-sdw-1.
So @bardliao you're saying it is actually deterministic: it appeared non-deterministic only because of BIOS configuration differences across devices?
Sorry I should have asked earlier.
Please check if following PR could solve this issue and sorry for the regression. https://github.com/thesofproject/linux/pull/5125
Thanks @brentlu I confirmed https://github.com/thesofproject/linux/pull/5125 works.
So @bardliao you're saying it is actually deterministic: it appeared non-deterministic only because of BIOS configuration differences across devices?
The issue is due to dmic_num != 0 when a sdw DMIC is present, and https://github.com/thesofproject/linux/pull/5125 is to set dmic_num = 0 when there are more than 2 sdw links are used. But, this will not work when we use a multi-function sdw codec like rt722 which contains DMIC function and only use one sdw link. Thus, I still think we should disable PCH DMIC when SDW DMIC is used.
Agree with @bardliao, if there is a SoundWire codec that deals with microphones then we cannot also have DMICs enabled. I think this will be a recurring error though, if there is an NHLT table copy/pasted from previous programs by OEMs along with a SoundWire mic codec, then we will have a problem.
I am not sure how to go about this though, we cannot solve this at the machine driver level since it would be too late. We need to nuke the dmic count before the topology is chosen. That can only be done by checking if any of the SoundWire devices has a mic function and force dmic_num = 0 to prevent the selection of a non-supported topology.
That's not simple.
Recent regression.
Spotted on both MTL and LNL.
Earliest failure spotted so far: Start Time: 2024-07-28 13:09:12 UTC Linux Commit: dc9dd7b28159 KConfig Commit: 8189104a4f38 SOF Commit: dc28dbdc6a4c
~The failure is NOT deterministic~, for instance these did not crash with the same commits: https://sof-ci.01.org/sofpr/PR9156/build6782/devicetest/index.html https://sof-ci.01.org/sofpr/PR9156/build6783/devicetest/index.html
EDIT: actually deterministic, just BIOS configuration differences?
There were only two very recent changes since this crash started happening:
Last week there was also this one but it was merged longer ago:
Sample failure: https://sof-ci.01.org/sofpr/PR9338/build6778/devicetest/index.html?model=LNLM_SDW_AIOC&testcase=verify-kernel-boot-log
Also observed in daily tests 44332 and 44336) (LNLM_SDW_AIOC ba-lnlm-rvp-sdw-01), 44336, 44363?model=MTLP_SDW_AIOC&testcase=verify-kernel-boot-log (ba-mtlp-sdw-aioc-03)