thesofproject / linux

Linux kernel source tree
Other
90 stars 129 forks source link

[LNL] multiple cases failing to HDMI playback on SDW configurations with gpu_bind disabled #5098

Closed kv2019i closed 2 months ago

kv2019i commented 3 months ago

Some interaction with recently merged kernel and FW PRs has caused a high rate failure to occur in PR testing: https://sof-ci.01.org/sofpr/PR9116/build6350/devicetest/index.html

2024-07-08 13:40:30 UTC [REMOTE_INFO] ===== Testing: (Round: 1/1) (PCM: HDMI1 [hw:0,5]) (Loop: 1/1) =====
2024-07-08 13:40:30 UTC [REMOTE_COMMAND] aplay   -Dhw:0,5 -r 48000 -c 2 -f S16_LE -d 10 /dev/zero -v -q
aplay: set_params:1416: Unable to install hw params:

The gpu_bind should be disabled in sof-dev kernel, so not sure why HDMI playback is attempted.

Related PRs merged recently:

As this is seen in PR testing marking as P1.

kv2019i commented 3 months ago

FYI @lyakh

marc-hb commented 2 months ago

Observed in today's daily run https://sof-ci.ostc.intel.com/#/result/planresultdetail/43591?model=LNLM_SDW_AIOC&testcase=check-playback-all-formats on jf-lnlm-rvp-sdw-1

Other LNL configurations are indeed not affected.

[  161.355590] kernel: soundwire_cadence:cdns_init_clock_ctrl: soundwire_intel soundwire_intel.link.0: mclk 19200000 max 4800000 row 50 col 4
[  161.355635] kernel: soundwire_cadence:cdns_init_clock_ctrl: soundwire_intel soundwire_intel.link.3: mclk 19200000 max 4800000 row 50 col 4
[  161.355708] kernel: soundwire_bus:sdw_modify_slave_status: rt1316-sdca sdw:0:2:025d:1316:01: initializing enumeration and init completion for Slave 1
[  161.355718] kernel: soundwire_cadence:cdns_init_clock_ctrl: soundwire_intel soundwire_intel.link.2: mclk 19200000 max 4800000 row 50 col 4
[  161.356138] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ASoC: error at snd_soc_dai_hw_params on iDisp1 Pin: -22
[  161.356276] kernel: snd_sof:sof_pcm_hw_free: sof-audio-pci-intel-lnl 0000:00:1f.3: pcm: free stream 5 dir 0
[  161.356561] kernel: snd_sof:sof_pcm_close: sof-audio-pci-intel-lnl 0000:00:1f.3: pcm: close stream 5 dir 0
[  161.357248] kernel: soundwire_cadence:cdns_update_slave_status_work: soundwire_intel soundwire_intel.link.0: Slave status change: 0x2
[  161.357268] kernel: soundwire_bus:sdw_handle_slave_status: soundwire sdw-master-0-0: Slave attached, programming device number
marc-hb commented 2 months ago

Could this be caused by some device-specific configuration? It did not happen on ba-lnlm-rvp-sdw-01 in the July 7th (planresultdetail/43565) and July 9th (planresultdetail/43642) daily tests. Also not in https://sof-ci.01.org/softestpr/PR1218/build625/devicetest/index.html

EDIT: failed on ba-lnlm-rvp-sdw-03 in https://sof-ci.01.org/sofpr/PR9276/build6355/devicetest/index.html https://sof-ci.01.org/softestpr/PR1218/build604/devicetest/index.html

jf-lnlm-rvp-sdw-1 in https://sof-ci.01.org/softestpr/PR1218/build600/devicetest/index.html

ssavati commented 2 months ago

This issue still reproducible on latest. Currntly we have WA "NO_HDMI_MODE=true" is set on device enviroment so we are not seeing issue in CI results. cc: @kv2019i @plbossart @lgirdwood

kv2019i commented 2 months ago

I'll take a look at this, but FYI to @ujfalusi and @ranj063 in case we need to switch.

ujfalusi commented 2 months ago

Only affecting LNL, TGL/MTL HDMI is working fine?

ssavati commented 2 months ago

@ujfalusi this is not observed on MTL. I will check on TGL and update

kv2019i commented 2 months ago

I think @ujfalusi @bardliao @plbossart there's a problem in sof_sdw mach driver handling the case where display driver is not available and no HDMI PCms are available: Jul 08 13:41:39 kernel: snd_soc_sof_sdw:sof_card_dai_links_create: sof_sdw sof_sdw: sdw 5, ssp 0, dmic 0, hdmi 0, bt: 0

But topology has (as it should) the HDMI nodes:

Jul 08 13:41:39 kernel: snd_sof:sof_dai_load: sof-audio-pci-intel-lnl 0000:00:1f.3: tplg: load pcm HDMI1
Jul 08 13:41:39 kernel: snd_sof:sof_dai_load: sof-audio-pci-intel-lnl 0000:00:1f.3: tplg: load pcm HDMI2
Jul 08 13:41:39 kernel: snd_sof:sof_dai_load: sof-audio-pci-intel-lnl 0000:00:1f.3: tplg: load pcm HDMI3
plbossart commented 2 months ago

I don't understand how the display driver became unavailable?

plbossart commented 2 months ago

we have this in the configuration: https://sof-ci.ostc.intel.com/#/result/planresultdetail/43591?model=LNLM_SDW_AIOC&testcase=verify-kernel-boot-log

/sys/module/snd_hda_core/parameters/gpu_bind:0

why is this value cleared?

static int gpu_bind = -1;
module_param(gpu_bind, int, 0644);
MODULE_PARM_DESC(gpu_bind, "Whether to bind sound component to GPU "
               "(1=always, 0=never, -1=on nomodeset(default))");

looks like a stale CI configuration to me, if we want to test HDMI this should not be cleared.

kv2019i commented 2 months ago

@plbossart wrote:

I don't understand how the display driver became unavailable?

It wasn't available in sof-dev yet for this platform (not marked as stable yet in kernel --> this can be overridden in the device configuration -> let me go and check this particular device).

UPDATE: edit, we still have commit 003bd609021b9a6205db19d7ef163101856071b5 in sof-dev and we can't remove until we pull in stable version of the xe support or we change the test device configurations to apply a force probe.

ujfalusi commented 2 months ago

I think @ujfalusi @bardliao @plbossart there's a problem in sof_sdw mach driver handling the case where display driver is not available and no HDMI PCms are available: Jul 08 13:41:39 kernel: snd_soc_sof_sdw:sof_card_dai_links_create: sof_sdw sof_sdw: sdw 5, ssp 0, dmic 0, hdmi 0, bt: 0

But topology has (as it should) the HDMI nodes:

Jul 08 13:41:39 kernel: snd_sof:sof_dai_load: sof-audio-pci-intel-lnl 0000:00:1f.3: tplg: load pcm HDMI1
Jul 08 13:41:39 kernel: snd_sof:sof_dai_load: sof-audio-pci-intel-lnl 0000:00:1f.3: tplg: load pcm HDMI2
Jul 08 13:41:39 kernel: snd_sof:sof_dai_load: sof-audio-pci-intel-lnl 0000:00:1f.3: tplg: load pcm HDMI3

@kv2019i, we should have dummy links for the HDMI PCMs to probe. They will not work, but they need to be there to be able to load the topology.

plbossart commented 2 months ago

We do have dummy links, the problem is not the probe:

    for (i = 0; i < hdmi_num; i++) {
        char *name = devm_kasprintf(dev, GFP_KERNEL, "iDisp%d", i + 1);
        char *cpu_dai_name = devm_kasprintf(dev, GFP_KERNEL, "iDisp%d Pin", i + 1);
        char *codec_name, *codec_dai_name;

        if (intel_ctx->hdmi.idisp_codec) {
            codec_name = "ehdaudio0D2";
            codec_dai_name = devm_kasprintf(dev, GFP_KERNEL,
                            "intel-hdmi-hifi%d", i + 1);
        } else {
            codec_name = "snd-soc-dummy";
            codec_dai_name = "snd-soc-dummy-dai";
        }

        ret = asoc_sdw_init_simple_dai_link(dev, *dai_links, be_id, name,
                            1, 0, // HDMI only supports playback
                            cpu_dai_name, platform_component->name,
                            ARRAY_SIZE(platform_component),
                            codec_name, codec_dai_name,
                            i == 0 ? sof_sdw_hdmi_init : NULL, NULL);
        if (ret)
            return ret;

        (*dai_links)++;
    }

It's the error on hw_params that needs to be root-caused.

plbossart commented 2 months ago

the debug log is misleading

    dev_dbg(dev, "sdw %d, ssp %d, dmic %d, hdmi %d, bt: %d\n",
        sdw_be_num, ssp_num, dmic_num,
        intel_ctx->hdmi.idisp_codec ? hdmi_num : 0, bt_num);

hdmi_num is 4 on TGL and 3 on all other devices, so we do create 3+ links.

ujfalusi commented 2 months ago

The HDMI PCM never worked when there were no HDMI hardware, it has been like this with HDA devices also. The hw_params fails because of the missing real DAI.

kv2019i commented 2 months ago

That's not true @ujfalusi , this has been working but has been broken at some point. It seems some of the changes to HDA DAI ops now return -EINVAL when dummy codec driver is connected. This DID work in the past.

UPDATE: I can confirm this is broken on TGL as well if HDMI is disable via codec_mask. This did work in the past, will bisect to see where this got broken.

ujfalusi commented 2 months ago

@kv2019i, I'm not sure about past, but now it is not working on tgl either:

[   32.290196] snd_soc_core:dpcm_be_dai_hw_params:  iDisp1: ASoC: hw_params BE iDisp1
[   32.290205] sof-audio-pci-intel-tgl 0000:00:1f.3: ASoC: error at snd_soc_dai_hw_params on iDisp1 Pin: -22
[   32.290212] snd_soc_core:dpcm_be_dai_hw_params:  HDMI1: ASoC: dpcm_be_dai_hw_params() failed at iDisp1 (-22)
[   32.290219] snd_soc_core:dpcm_fe_dai_hw_free:  HDMI1: ASoC: hw_free FE HDMI1
ujfalusi commented 2 months ago

and:

kv2019i commented 2 months ago

I'm sure about the past :) -- but this is not just for debug, this is actual product config for HDA where there is Intel GPU is disabled for reason or another. Granted most of these laptops use the non-SOF driver, but there are actual product configs with dmic (=SOF) and some other GPU, so this dummy codec construct must work!

ujfalusi commented 2 months ago

@kv2019i, I trust your memory. It did not worked on 18.09.2023: https://github.com/thesofproject/linux/issues/4594#issuecomment-1722865534

Can this be the reason: https://github.com/thesofproject/linux/pull/4659 ? We don't register HDMI dais when there is no HDMI, before that PR we registered the dais multiple times (analog would register the HDMI also and HDMI would register the analog), causing warnings.

kv2019i commented 2 months ago

No, it's not #4659 -- this is probably older.

I'll lower the priority now as this is not hit at card probe and normal applications will not open the HDMI if no monitor is detected (and no monitor ever will on these devices). So the remaining open is Pulseaudio/Pipewire habit of opening the PCMs and doing a hw_params query. Maybe -EINVAL is ok for this case as well (and my memory really malfunctions here). If so, we can close this.

kv2019i commented 2 months ago

Tested with upstream 6.8 kernel and pipewire 0.3.79 (versions used in 24.04LTS) and the -EINVAL errors at pipewire start are handled correctly and rest of audio functionalty is ok. So I'll close this as works-as-expected ad we can track the test device configuration issues elsewhere.