thesofproject / linux

Linux kernel source tree
Other
91 stars 133 forks source link

WHL: UpExtreme: iDisp3: ASoC: error at __soc_pcm_open on iDisp3: -16 #4861

Closed plbossart closed 7 months ago

plbossart commented 8 months ago

Reported by internal tests, e.g.

/result/planresultdetail/38998?model=WHL_UPEXT_HDA-ipc3&testcase=multiple-pipeline-playback-50

This is concerning since it's just a generic HDaudio device...No idea when it started to appear.

[ 2576.892664] kernel: snd_sof:ipc3_log_header: sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60010000: GLB_STREAM_MSG: PCM_PARAMS
[ 2576.893024] kernel: snd_sof_intel_hda_common:hda_set_stream_data_offset: sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: stream dir 0, posn mailbox offset is 790528
[ 2576.893168] kernel: snd_hda_codec_hdmi ehdaudio0D2: ASoC: error at snd_soc_dai_startup on intel-hdmi-hifi3: -16
[ 2576.893191] kernel:  iDisp3: ASoC: error at __soc_pcm_open on iDisp3: -16
[ 2576.893204] kernel:  HDMI3: ASoC: error at dpcm_be_dai_startup on HDMI3: -16
[ 2576.893215] kernel:  HDMI3: ASoC: error at dpcm_fe_dai_startup on HDMI3: -16
Test Case URL: https://github.com/thesofproject/sof-test/tree/main/test-case/MODEL=WHL_UPEXT_HDA-ipc3
cmd: TPLG=/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg MODEL=WHL_UPEXT_HDA-ipc3 SOF_TEST_INTERVAL=5 ~/sof-test/test-case/multiple-pipeline.sh -f p -l 50
====================
framework load: ssh  ubuntu@jf-whl-upext-hda-2.jf.intel.com  'nohup bash -c "TPLG=/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg MODEL=WHL_UPEXT_HDA-ipc3 SOF_TEST_INTERVAL=5 ~/sof-test/test-case/multiple-pipeline.sh -f p -l 50 2>&1 | nc -N 10.54.29.142 50789" >/dev/null &'
====================
framework run: 'netcat -l -k 50789'
====================
2024-03-19 01:19:35 UTC [REMOTE_INFO] ktime=2576 sof-test PID=18830: starting
2024-03-19 01:19:35 UTC [REMOTE_INFO] Starting /usr/local/bin/sof-logger -t -l /etc/sof/sof-cnl.ldc > /home/ubuntu/sof-test/logs/multiple-pipeline/2024-03-19-01:19:35-20621/slogger.txt &
2024-03-19 01:19:35 UTC [REMOTE_INFO] /home/ubuntu/sof-test/test-case/multiple-pipeline.sh will use topology /usr/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg to run the test case
2024-03-19 01:19:35 UTC [REMOTE_INFO] Pipeline list to ignore is specified, will ignore 'pcm=HDA Digital' in test case
2024-03-19 01:19:35 UTC [REMOTE_INFO] Run command to get pipeline parameters
2024-03-19 01:19:35 UTC [REMOTE_COMMAND] sof-tplgreader.py /usr/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg -f 'type:playback & ~pcm:Amplifier Reference' -b ' pcm:HDA Digital' -s 0 -e
2024-03-19 01:19:35 UTC [REMOTE_INFO] ===== Testing: (Loop: 1/50) =====
2024-03-19 01:19:35 UTC [REMOTE_INFO] /home/ubuntu/sof-test/test-case/multiple-pipeline.sh will use topology /usr/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg to run the test case
2024-03-19 01:19:35 UTC [REMOTE_INFO] Pipeline list to ignore is specified, will ignore 'pcm=HDA Digital' in test case
2024-03-19 01:19:35 UTC [REMOTE_INFO] Run command to get pipeline parameters
2024-03-19 01:19:35 UTC [REMOTE_COMMAND] sof-tplgreader.py /usr/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg -f 'type:playback ' -b ' pcm:HDA Digital' -s 0 -e
2024-03-19 01:19:35 UTC [REMOTE_INFO] Testing: HDA Analog [hw:0,0]
2024-03-19 01:19:35 UTC [REMOTE_COMMAND] aplay   -D hw:0,0 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-03-19 01:19:35 UTC [REMOTE_INFO] Testing: HDMI1 [hw:0,3]
2024-03-19 01:19:35 UTC [REMOTE_COMMAND] aplay   -D hw:0,3 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-03-19 01:19:35 UTC [REMOTE_INFO] Testing: HDMI2 [hw:0,4]
2024-03-19 01:19:35 UTC [REMOTE_COMMAND] aplay   -D hw:0,4 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-03-19 01:19:35 UTC [REMOTE_INFO] Testing: HDMI3 [hw:0,5]
2024-03-19 01:19:35 UTC [REMOTE_COMMAND] aplay   -D hw:0,5 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-03-19 01:19:35 UTC [REMOTE_INFO] wait 0.5s for aplay_opts()
aplay: main:831: audio open error: Device or resource busy

@ujfalusi @kv2019i FYI

cc:

plbossart commented 8 months ago

I can't reproduce this issue on my own UpExtreme board which was collecting dust for a while.

I have zero issues when pulseaudio is disabled (as in mv /usr/bin/pulseaudio /usr/bin/_pulseaudio)

TPLG=/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg sof-test/test-case/run-all-tests.sh -l1

...

PASS: tplg-binary; pcm_list; playback-d100l1r1; capture-d100l1r1; playback-d1l100r1; capture_d1l100r1; playback_d1l1r50; capture_d1l1r50; speaker; pause-resume-playback; pause-resume-capture; volume; signal-stop-start-playback; signal-stop-start-capture; simultaneous-playback-capture; multiple-pipeline-playback; multiple-pipeline-capture; multiple-pause-resume; kmod-load-unload; kmod-load-unload-after-playback; suspend-resume; suspend-resume-with-playback; suspend-resume-with-capture;
SKIP: sof-logger; ipc-flood; xrun-injection-playback; xrun-injection-capture; volume_levels;

 test end with 0 failed tests

With pulseaudio nothing works, so my money is on a configuration problem for this board.

@fredoh9 @marc-hb any ideas?

marc-hb commented 8 months ago

I can reproduce on this particular device. There is no pulseaudio running (I think we check that somewhere?)

I can see this error at boot and then again every time the test fails:


TPLG=/lib/firmware/intel/sof-tplg/sof-hda-generic-4ch.tplg ~/sof-test/test-case/multiple-pipeline.sh -f p  -s

Mar 19 22:52:16 jf-whl-upext-hda-2 kernel: snd_hda_codec_hdmi ehdaudio0D2: ASoC: error at snd_soc_dai_startup on intel-hdmi-hifi3: -16
Mar 19 22:52:16 jf-whl-upext-hda-2 kernel:  iDisp3: ASoC: error at __soc_pcm_open on iDisp3: -16
Mar 19 22:52:16 jf-whl-upext-hda-2 kernel:  HDMI3: ASoC: error at dpcm_be_dai_startup on HDMI3: -16
Mar 19 22:52:16 jf-whl-upext-hda-2 kernel:  HDMI3: ASoC: error at dpcm_fe_dai_startup on HDMI3: -16

@kv2019i any HDMI idea/suggestion?

@fredoh9 could you (re?)-add verify-kernel-boot-log.sh at the start of the stable-v2.2 test plan?

I also spotted this:

Mar 19 22:52:06 jf-whl-upext-hda-2 systemd[712]: Failed to start Application launched by gnome-session-binary.
Mar 19 22:52:06 jf-whl-upext-hda-2 systemd[712]: app-gnome-xdg\x2duser\x2ddirs-825.scope: Failed to add PIDs to scope's control group: No such process
Mar 19 22:52:06 jf-whl-upext-hda-2 systemd[712]: app-gnome-xdg\x2duser\x2ddirs-825.scope: Failed with result 'resources'.
Mar 19 22:52:06 jf-whl-upext-hda-2 systemd[712]: Failed to start Application launched by gnome-session-binary.
Mar 19 22:52:06 jf-whl-upext-hda-2 systemd[712]: org.gnome.Shell@x11.service: Start request repeated too quickly.

FYI:

sudo grep -i connected /sys/kernel/debug/dri/0/i915_display_info
[CONNECTOR:95:DP-1]: status: connected
[CONNECTOR:107:DP-2]: status: disconnected
[CONNECTOR:114:DP-3]: status: disconnected
[CONNECTOR:120:HDMI-A-1]: status: disconnected
--------- Printing module parameters ----------
/sys/module/snd_intel_dspcfg/parameters/dsp_driver:3
/sys/module/snd_sof/parameters/ipc4_ignore_cpc:N
/sys/module/snd_sof/parameters/sof_debug:1
/sys/module/snd_sof_intel_hda/parameters/codec_mask:-1
/sys/module/snd_sof_intel_hda_common/parameters/always_enable_dmi_l1:N
/sys/module/snd_sof_intel_hda_common/parameters/disable_rewinds:N
/sys/module/snd_sof_intel_hda_common/parameters/enable_trace_D0I3_S0:N
/sys/module/snd_sof_intel_hda_common/parameters/position_quirk:1
/sys/module/snd_sof_intel_hda_common/parameters/sof_use_tplg_nhlt:N
/sys/module/snd_sof_intel_hda_generic/parameters/dmic_num:-1
/sys/module/snd_sof_intel_hda_generic/parameters/hda_model:(null)
/sys/module/snd_sof_intel_hda_generic/parameters/mclk_id:-1
/sys/module/snd_sof_intel_hda_generic/parameters/sdw_clock_stop_quirks:8
/sys/module/snd_sof_intel_hda_generic/parameters/use_msi:Y
/sys/module/snd_sof_pci/parameters/fw_filename:(null)
/sys/module/snd_sof_pci/parameters/fw_path:(null)
/sys/module/snd_sof_pci/parameters/ipc_type:-1
/sys/module/snd_sof_pci/parameters/lib_path:(null)
/sys/module/snd_sof_pci/parameters/sof_pci_debug:0
/sys/module/snd_sof_pci/parameters/tplg_filename:(null)
/sys/module/snd_sof_pci/parameters/tplg_path:(null)
/sys/module/snd_sof_probes/parameters/enable:N
/sys/module/snd_hda_codec/parameters/dump_coef:-1
/sys/module/snd_hda_codec_hdmi/parameters/enable_acomp:Y
/sys/module/snd_hda_codec_hdmi/parameters/enable_all_pins:N
/sys/module/snd_hda_codec_hdmi/parameters/enable_silent_stream:Y
/sys/module/snd_hda_codec_hdmi/parameters/static_hdmi_pcm:N
/sys/module/snd_hda_core/parameters/gpu_bind:-1
----------------------------------------
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 3.2.0 present.

Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
    Vendor: American Megatrends Inc.
    Version: UPW1AM18
    Release Date: 06/12/2020
    Address: 0xF0000
    Runtime Size: 64 kB
    ROM Size: 16 MB
    Characteristics:
        PCI is supported
        BIOS is upgradeable
        BIOS shadowing is allowed
        Boot from CD is supported
        Selectable boot is supported
        BIOS ROM is socketed
        EDD is supported
        5.25"/1.2 MB floppy services are supported (int 13h)
        3.5"/720 kB floppy services are supported (int 13h)
        3.5"/2.88 MB floppy services are supported (int 13h)
        Print screen service is supported (int 5h)
        Serial services are supported (int 14h)
        Printer services are supported (int 17h)
        ACPI is supported
        USB legacy is supported
        BIOS boot specification is supported
        Targeted content distribution is supported
        UEFI is supported
    BIOS Revision: 5.13
marc-hb commented 8 months ago

Thanks @ujfalusi and @plbossart for spotting this:

Last successful runs were on sh-whl-upext-hda-01 (last successful run was 38874, March 14th)

First failing run was 38915 on jf-whl-upext-hda-4 (March 15th)

38998 failed on jf-whl-upext-hda-2 (March 19th)

marc-hb commented 8 months ago

The failing aplay seems to be always on HDMI3 == hw:0,5

One thing I know @fredoh9 did in JF is to use HDMI "stubs" to fake monitors and avoid Ubuntu issues https://github.com/thesofproject/sof-test/discussions/998

Maybe this wasn't done in SH?

Another thing: this can be reproduced easily but sometimes a fresh reboot is required to reproduce!

kv2019i commented 8 months ago

Needs more investigation. The codec driver can return -EBUSY if there are no free slots. It's possible something has changed in uptream that would trigger this.

plbossart commented 8 months ago

The last upstream merge was two weeks ago, the first error happened a week ago after the last upstream merge.

marc-hb commented 8 months ago

The failing aplay seems to be always on HDMI3 == hw:0,5

Correction: the failure always happens on the LAST HDMI port opened concurrently. I reversed the 0 1 2 3 device order in the test to 3 2 1 0 and now it is HDMI1 that fails:

2024-03-20 17:45:09 UTC [INFO] Testing: HDMI3 [hw:0,5]
2024-03-20 17:45:09 UTC [COMMAND] aplay   -D hw:0,5 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-03-20 17:45:11 UTC [INFO] Testing: HDMI2 [hw:0,4]
2024-03-20 17:45:11 UTC [COMMAND] aplay   -D hw:0,4 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-03-20 17:45:13 UTC [INFO] Testing: HDMI1 [hw:0,3]
2024-03-20 17:45:13 UTC [COMMAND] aplay   -D hw:0,3 -c 2 -r 48000 -f S16_LE /dev/zero -q
aplay: main:831: audio open error: Device or resource busy
2024-03-20 17:45:15 UTC [INFO] Testing: HDA Analog [hw:0,0]
2024-03-20 17:45:15 UTC [COMMAND] aplay   -D hw:0,0 -c 2 -r 48000 -f S16_LE /dev/zero -q
Mar 20 17:43:39 jf-whl-upext-hda-2 kernel: snd_hda_codec_hdmi ehdaudio0D2: ASoC: error at snd_soc_dai_startup on intel-hdmi-hifi1: -16
Mar 20 17:43:39 jf-whl-upext-hda-2 kernel:  iDisp1: ASoC: error at __soc_pcm_open on iDisp1: -16
Mar 20 17:43:39 jf-whl-upext-hda-2 kernel:  HDMI1: ASoC: error at dpcm_be_dai_startup on HDMI1: -16
Mar 20 17:43:39 jf-whl-upext-hda-2 kernel:  HDMI1: ASoC: error at dpcm_fe_dai_startup on HDMI1: -16
marc-hb commented 8 months ago

I can't reproduce this issue on my own UpExtreme board which was collecting dust for a while.

@plbossart I think I may have found why you cannot reproduce: the display needs to be turned ON to reproduce. Can you try again with the display ON?

You can see whether the screen is on or not like this:

sudo grep -B2 -e enable -e active -e connected -e audio /sys/kernel/debug/dri/0/i915_display_info

You can control the timeout like this:

gsettings get org.gnome.desktop.session idle-delay  # default 5 minutes for me
gsettings set org.gnome.desktop.session idle-delay 60 # in seconds. There seems to be a 20s minimum?

You can light up the monitor again like this:

sudo systemctl restart gdm3

When the screen goes off, the logs show the DSP going to D3:


Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof_intel_hda_common:hda_dsp_stream_trigger: sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x80]=0x20140000 successful
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof:ipc3_log_header: sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x90050000: GLB_TRACE_MSG: DMA_FREE
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof:ipc3_log_header: sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x40010000: GLB_PM_MSG: CTX_SAVE
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof_intel_hda_common:hda_dsp_core_reset_enter: sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x4]=0x1010f0f successful
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof_intel_hda_common:hda_dsp_core_power_down: sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x4]=0xf0f successful
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof_intel_hda_common:hda_dsp_core_is_enabled: sof-audio-pci-intel-cnl 0000:00:1f.3: DSP core(s) enabled? 0 : core_mask f
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof:snd_sof_pci_update_bits_unlocked: sof-audio-pci-intel-cnl 0000:00:1f.3: Debug PCIR: 00000000 at  00000044
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof:snd_sof_pci_update_bits_unlocked: sof-audio-pci-intel-cnl 0000:00:1f.3: Debug PCIW: 00000010 at  00000044
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof_intel_hda:hda_codec_i915_display_power: sof-audio-pci-intel-cnl 0000:00:1f.3: Turning i915 HDAC power 0
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof_intel_hda_common:hda_dsp_state_log: sof-audio-pci-intel-cnl 0000:00:1f.3: Current DSP power state: D3
Mar 20 17:47:18 jf-whl-upext-hda-2 kernel: snd_sof:sof_set_fw_state: sof-audio-pci-intel-cnl 0000:00:1f.3: fw_state change: 7 -> 0

One thing I know @fredoh9 did in JF is to use HDMI "stubs" to fake monitors and avoid Ubuntu issues https://github.com/thesofproject/sof-test/discussions/998

Maybe this wasn't done in SH?

I just checked and sh-whl-upext-hda-01 has no HDMI stub:

sudo grep -B2 -e enable -e active -e connected /sys/kernel/debug/dri/0/i915_display_info

Connector info
--------------
[CONNECTOR:95:DP-1]: status: disconnected
[CONNECTOR:107:DP-2]: status: disconnected
[CONNECTOR:114:DP-3]: status: disconnected
[CONNECTOR:120:HDMI-A-1]: status: disconnected

cc: @ssavati

marc-hb commented 8 months ago

Issue reproduces exactly the same with Ubuntu kernels 6.2.0-34-generic and 6.5.0-26-generic, so clearly not any recent regression.

Display OFF: test always passes Display ON: test always fails the same way

Both lsmod | grep snd and lsof | grep dev/snd show nothing busy before the test runs and fails. Usage count for both snd_soc_skl_hda_dsp and snd_sof_pci_intel_cnl is zero before the test runs and fails.

plbossart commented 8 months ago

yeah, my tests were in headless mode. I just run the tests without any displays connected. I'll retry later today and try to see what happens.

plbossart commented 8 months ago

I connected a portable display (the one I use when I travel), and I see no issues. HDMI1 [hw:0,3] works fine, I can hear sound being played without any distortion.

root@plb-UP-WHL01:~# sudo grep -B2 -e audio -e enable -e active -e connected /sys/kernel/debug/dri/0/i915_display_info
---------
[CRTC:51:pipe A]:
    uapi: enable=yes, active=yes, mode="": 60 148500 1920 2008 2052 2200 1080 1084 1089 1125 0x0 0x5
    hw: enable=yes, active=yes
--
    underrun reporting: cpu=yes pch=yes
[CRTC:72:pipe B]:
    uapi: enable=no, active=no, mode="": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
    hw: enable=no, active=no
--
    underrun reporting: cpu=yes pch=yes
[CRTC:93:pipe C]:
    uapi: enable=no, active=no, mode="": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
    hw: enable=no, active=no
--
Connector info
--------------
[CONNECTOR:95:DP-1]: status: connected
--
        "640x480": 60 25200 640 656 752 800 480 490 492 525 0x40 0xa
        "640x480": 60 25175 640 656 752 800 480 490 492 525 0x40 0xa
[CONNECTOR:107:DP-2]: status: disconnected
[CONNECTOR:114:DP-3]: status: disconnected
[CONNECTOR:120:HDMI-A-1]: status: disconnected

Can we try to see if the tests pass with a real monitor and not the 'HDMI stubs'? I wonder if those gizmos cause problems.

marc-hb commented 8 months ago

I tried this based on @plbossart 's advice but it does not show anything "busy" while the display is ON

sudo systemctl restart gdm3

sudo grep -B2 -e audio -e enable -e active -e connected /sys/kernel/debug/dri/0/i915_display_info

 => display is ON

ubuntu@jf-whl-upext-hda-4:~$ grep ^ $(find /proc/asound/ -name hw_params)
/proc/asound/card0/pcm0c/sub0/hw_params:closed
/proc/asound/card0/pcm0p/sub0/hw_params:closed
/proc/asound/card0/pcm3p/sub0/hw_params:closed
/proc/asound/card0/pcm4p/sub0/hw_params:closed
/proc/asound/card0/pcm5p/sub0/hw_params:closed
/proc/asound/card0/pcm6c/sub0/hw_params:closed
/proc/asound/card0/pcm7c/sub0/hw_params:closed

 => nothing "active" yet test fails as usual
marc-hb commented 8 months ago

HDMI1 [hw:0,3] works fine,

The failure is NOT port-specific. It fails on the LAST HDMI device opened concurrently.

For instance, if you change the test like this

--- a/test-case/multiple-pipeline.sh
+++ b/test-case/multiple-pipeline.sh
@@ -104,7 +104,7 @@ func_run_pipeline_with_type()
         idx_lst=($(seq 0 $((PIPELINE_COUNT - 1)) | sed 's/ /\n/g' | shuf | xargs))
     fi

-    for idx in "${idx_lst[@]}"
+    for idx in 3 1 2 0
     do
         channel=$(func_pipeline_parse_value "$idx" channel)
         rate=$(func_pipeline_parse_value "$idx" rate)

then HDMI2 fails:

2024-03-20 23:13:21 UTC [INFO] Testing: HDMI3 [hw:0,5]
2024-03-20 23:13:21 UTC [COMMAND] aplay   -D hw:0,5 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-03-20 23:13:21 UTC [INFO] Testing: HDMI1 [hw:0,3]
2024-03-20 23:13:21 UTC [COMMAND] aplay   -D hw:0,3 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-03-20 23:13:21 UTC [INFO] Testing: HDMI2 [hw:0,4]
2024-03-20 23:13:21 UTC [COMMAND] aplay   -D hw:0,4 -c 2 -r 48000 -f S16_LE /dev/zero -q
aplay: main:831: audio open error: Device or resource busy
2024-03-20 23:13:21 UTC [INFO] Testing: HDA Analog [hw:0,0]
2024-03-20 23:13:21 UTC [COMMAND] aplay   -D hw:0,0 -c 2 -r 48000 -f S16_LE /dev/zero -q
Mar 20 23:13:21 jf-whl-upext-hda-4 kernel: snd_hda_codec_hdmi ehdaudio0D2: ASoC: error at snd_soc_dai_startup on intel-hdmi-hifi2: -16
Mar 20 23:13:21 jf-whl-upext-hda-4 kernel:  iDisp2: ASoC: error at __soc_pcm_open on iDisp2: -16
Mar 20 23:13:21 jf-whl-upext-hda-4 kernel:  HDMI2: ASoC: error at dpcm_be_dai_startup on HDMI2: -16
Mar 20 23:13:21 jf-whl-upext-hda-4 kernel:  HDMI2: ASoC: error at dpcm_fe_dai_startup on HDMI2: -16

I can hear sound being played without any distortion.

I wonder if those gizmos cause problems.

Can you try with a monitor that does NOT support audio? Pretty sure these devices do NOT support audio (how can we tell?)

plbossart commented 8 months ago

Can you try with a monitor that does NOT support audio? Pretty sure these devices do NOT support audio (how can we tell?)

I don't have one! I made sure all my displays can be tested with audio....

Can you try this patch https://github.com/thesofproject/linux/pull/4866 to try to get more information on what happens? I think it's a codec issue, not something related to SOF proper.

plbossart commented 8 months ago

Additional logs shows that it's indeed a codec issue

[   65.788307] snd_hda_codec_hdmi ehdaudio0D2: hdac_hda_dai_open: failed -16
[   65.788323] snd_hda_codec_hdmi ehdaudio0D2: ASoC: error at snd_soc_dai_startup on intel-hdmi-hifi3: -16
static int hdac_hda_dai_open(struct snd_pcm_substream *substream,
                 struct snd_soc_dai *dai)
{
    ....

    ret = hda_stream->ops.open(hda_stream, hda_pvt->codec, substream);
    if (ret < 0)
        dev_err(dai->dev, "%s: failed %d\n", __func__, ret);
marc-hb commented 8 months ago

Pretty sure these devices do NOT support audio (how can we tell?)

I stand corrected:

sudo systemctl restart gdm3 # turns display ON

sudo grep -B6 audio /sys/kernel/debug/dri/0/i915_display_info

--------------
[CONNECTOR:95:DP-1]: status: connected
    physical dimensions: 510x290mm
    subpixel order: Unknown
    CEA rev: 3
    DPCD rev: 12
    audio support: yes
plbossart commented 8 months ago

Additional traces do show it's indeed a 'codec' issue

[   22.071314] snd_hda_codec_hdmi ehdaudio0D2: hdac_hda_dai_open: start
[   22.071318] hdmi_pcm_open: start
[   22.071321] hdmi_pcm_open_no_pin: start
[   22.071322] hdmi_pcm_open_no_pin: hdmi_choose_cvt failed
[   22.071324] hdmi_pcm_open:  hdmi_pcm_open_no_pin returned -16
[   22.071326] hdmi_pcm_open: end
[   22.071327] snd_hda_codec_hdmi ehdaudio0D2: hdac_hda_dai_open: failed -16
[   22.071332] snd_hda_codec_hdmi ehdaudio0D2: ASoC: error at snd_soc_dai_startup on intel-hdmi-hifi3: -16
[   22.071337]  iDisp3: ASoC: error at __soc_pcm_open on iDisp3: -16
[   22.071341]  HDMI3: ASoC: error at dpcm_be_dai_startup on HDMI3: -16
[   22.071344]  HDMI3: ASoC: error at dpcm_fe_dai_startup on HDMI3: -16
static int hdmi_pcm_open_no_pin(struct hda_pcm_stream *hinfo,
             struct hda_codec *codec,
             struct snd_pcm_substream *substream)
{
....
    err = hdmi_choose_cvt(codec, -1, &cvt_idx, false);
    if (err) {
        pr_info("%s: hdmi_choose_cvt failed\n", __func__);
        return err;
    }

No idea what this code does...

plbossart commented 8 months ago
static int hdmi_choose_cvt(struct hda_codec *codec,
               int pin_idx, int *cvt_id,
               bool silent)
{
...
    /* No free converters */
    if (cvt_idx == spec->num_cvts) {
        pr_info("%s: no free converters\n", __func__);
        return -EBUSY;
    }

This is where I will stop my investigation for now... @kv2019i @ujfalusi may have a better understanding that me on this issue.

I think it's a corner case coming from the type of 'HDMI stubs' connected to the Up Extreme DUTs.

Still that's concerning that we've had this problem for a while and never even knew about it.... Oh well.

plbossart commented 8 months ago

FWIW, here's the diffs I used

cvt.diff.txt

marc-hb commented 8 months ago

Still that's concerning that we've had this problem for a while and never even knew about it.... Oh well.

Not surprising at all considering this requires some (special?) type of monitor to be connected AND powered on!

plbossart commented 8 months ago

Simpler test script:

aplay -Dhw:0,3 -c2 -r48000 -fS16_LE /dev/zero &
aplay -Dhw:0,4 -c2 -r48000 -fS16_LE /dev/zero &
aplay -Dhw:0,5 -c2 -r48000 -fS16_LE /dev/zero

So this is not an issue with open/close happening concurrently, more that the third stream isn't supported for some reason.

plbossart commented 8 months ago

Doesn't happen on latest daily test on jf-whl-upext-hda-3, not sure what changed?

plbossart commented 8 months ago

@fredoh9 was there a change in hardware configs? The daily tests using jf-whl-upext-hda-3 don't show any issues.

marc-hb commented 8 months ago

I never reproduced on jf-whl-upext-hda-3, did anyone or anything? I reproduced only with jf-...-hda-2 and jf-...-hda-4.

It could have a different BIOS or different model of HDMI plug. I'll have a look when I get back to JF.

plbossart commented 8 months ago

Indeed the last test used jf-whl-upext-hda-4 and the issue is back. Those device-to-device variations are hard to track..

marc-hb commented 8 months ago

So the reason why jf-whl-upext-hda-3 passed and the others failed is: jf-whl-upext-hda-3 is using a different HDMI stub that does not fake audio support. As soon as I swapped the HDMI stubs, the results followed the stubs.

BUT, I also tested the failing stub on my APL UP2 and TGL UP xtreme and it did NOT cause any problem. So there is a problem specific to this WHL model. Reminder: the failure is NOT on the port connected to the HDMI stub. The failure is always on the last HDMI port used concurrently.

Instead of that HDMI stub, I connected a monitor with audio to a failing board and found no failure and no audio issue.

Here's the EDID for the HDMI stub that triggers the failure:

sudo apt install read-edid
sudo get-edid | parse-edid 

    Identifier "             "
    ModelName "             "
    VendorName "CLB"
    # Monitor Manufactured week 40 of 2013
    # EDID version 1.3
marc-hb commented 8 months ago

Very interesting and relevant HDMI audio facts learned after discussing with the local expert (thanks Clint!)

Do this:

sudo apt install edid-decode

sudo get-edid | edid-decode

  Underscans IT Video Formats by default
  Basic audio support
  Supports YCbCr 4:4:4
  ...

I've been looking in the labs for a 3rd type of HDMI stub but I didn't find any yet. Ideally a compliant one with audio support.

marc-hb commented 8 months ago

I've been looking in the labs for a 3rd type of HDMI stub but I didn't find any yet. Ideally a compliant one with audio support.

No way to find a 3rd type.

We can silence this failure by using on WHL only the (non compliant...) HDMI stub that does not support audio. Deal? It would only be hiding the failure, not solving it.

marc-hb commented 8 months ago

We can silence this failure by using on WHL only the (non compliant...) HDMI stub that does not support audio. Deal? It would only be hiding the failure, not solving it.

Done on jf-whl-*-hda2,3,4

plbossart commented 8 months ago

I am lost here.

We have an issue with a HDMI compliant device which exposes audio, but "how compliant" is it if we get this error all the time?

marc-hb commented 8 months ago

but "how compliant" is it if we get this error all the time?

We get this error only on the WHL UP boards. Other boards with the same HDMI stub pass the test just fine, see above.

Also, remember that the failure is NOT tied to the HDMI port where the stub is used, see above.

plbossart commented 7 months ago

I'll close this issue since it's a corner case we have no intention of fixing.

ujfalusi commented 7 months ago

FWIW, I had encountered an issue at home: HDMI CEC suddenly stopped working. My TV was not 'seeing' the AVR anymore and could not get ARC working. It got solved by disconnecting a NUC with WHL from the AVR... This might be a kernel regression, I think I had the NUC there for over a year and noticed oddities around January.