thesofproject / linux

Linux kernel source tree
Other
91 stars 133 forks source link

[HD-A] System does not wake up Playback/Capture-> pause -> suspend->resume scenario #5035

Open ssavati opened 5 months ago

ssavati commented 5 months ago

System is not waking up Playback/Capture-> pause -> resume scenario.

Steps to reproduce

Other observation

For SDW configuartion issue not observed. This issue is observed for Headset playback/capture. for HDMI device system wakes up. System wake up if we do suspend without pausing Playback/capture.

Issue reproduciblity : 100%

Kernel/firmware branch/commits Linux Branch: topic/sof-dev Linux Commit: fbf0b7bd4110

SOF Branch: main SOF Commit: 283475c0d6c8 Zephyr Commit: 9d8059b6e554

cc:

plbossart commented 5 months ago

Thanks for testing this @ssavati, this looks like a global issue indeed. Could you test this on LNL where the HDAudio DMA is used as well, I am not sure we're tested this, ever.

@kv2019i @ranj063 @ujfalusi my money is on a transition where we keep the DMA programmed, which is known to have side effects.

ujfalusi commented 5 months ago

I can confirm this on TGL (upx-i11)

ujfalusi commented 5 months ago

I think I have a fix for this...

ujfalusi commented 5 months ago

I think I have a fix for this...

it will take some time. I can prevent the lockup, but the sequencing is pretty broken.

ujfalusi commented 5 months ago

The finding so far: suspend while pause does not work (locks up the system) with sof-dev with IPC3 and IPC4 alike The lockup is caused by incorrect code in hda_dai_suspend() Fixing the function will allow the suspend/resume while paused, but it is still broken for IPC3/IPC4: IPC3 is w/o errors but when you try to un-pause, the aplay will exit IPC4 have IPC errors on suspend, but after resume we manage to restart the stream (resume fail, and full restart happens)

To fix the IPC4 errors and the broken IPC3 we need to move the paused stream to suspend-proper by:

int sof_pcm_suspend_paused(struct snd_sof_dev *sdev)
{
    struct snd_pcm_substream *substream;
    struct snd_sof_pcm *spcm;
    int dir, ret;

    list_for_each_entry(spcm, &sdev->pcm_list, list) {
        for_each_pcm_streams(dir) {
            substream = spcm->stream[dir].substream;
            if (!substream || !substream->runtime)
                continue;

            if (!(substream->runtime->state == SNDRV_PCM_STATE_SUSPENDED &&
                substream->runtime->suspended_state == SNDRV_PCM_STATE_PAUSED))
                continue;

            ret = substream->ops->trigger(substream,
                              SNDRV_PCM_TRIGGER_PAUSE_RELEASE);
            if (ret)
                return ret;

            ret = substream->ops->trigger(substream,
                              SNDRV_PCM_TRIGGER_SUSPEND);
            if (ret)
                return ret;
        }
    }

    return 0;
}

and call this from early suspend. This will make the errors go away, but still a full restart will be done instead of a clean resume...

plbossart commented 5 months ago

we had to do something manually for SoundWire to support the suspend-while-paused transition.

static int intel_component_dais_suspend(struct snd_soc_component *component)
{
    struct snd_soc_dai *dai;

    /*
     * In the corner case where a SUSPEND happens during a PAUSE, the ALSA core
     * does not throw the TRIGGER_SUSPEND. This leaves the DAIs in an unbalanced state.
     * Since the component suspend is called last, we can trap this corner case
     * and force the DAIs to release their resources.
     */
    for_each_component_dais(component, dai) {
        struct sdw_cdns *cdns = snd_soc_dai_get_drvdata(dai);
        struct sdw_cdns_dai_runtime *dai_runtime;

        dai_runtime = cdns->dai_runtime_array[dai->id];

        if (!dai_runtime)
            continue;

        if (dai_runtime->suspended)
            continue;

        if (dai_runtime->paused)
            dai_runtime->suspended = true;
    }

    return 0;
}
ujfalusi commented 5 months ago

I can confirm this on TGL (upx-i11)

It happens with both IPC3 and IPC4.

ujfalusi commented 5 months ago

Strictly avoiding the lockup: https://github.com/thesofproject/linux/pull/5049 Suspend while paused remains broken, but at least the system is not locked and we got to see the nice IPC and other errors and application giving up (aplay)

marc-hb commented 4 months ago

I haven't followed what changed exactly but in the last 4 years I've never seen a suspend/resume pass rate so high than in the last week or two. I mean across the board, not with LNL specifically.

ujfalusi commented 4 months ago

I haven't followed what changed exactly but in the last 4 years I've never seen a suspend/resume pass rate so high than in the last week or two. I mean across the board, not with LNL specifically.

It looks like the system lock is fixed by https://github.com/thesofproject/linux/pull/5085 (which is similar to one of my other attempt: https://github.com/thesofproject/linux/commit/f710445fd238adcbae399753a809a8ae96635040), but the suspend while pause is still broken: aplay pause it suspend (rtcwake) after system resume, un-pause the aplay stop aplay:

[  112.923088] snd_sof:sof_ipc4_route_free: sof-audio-pci-intel-tgl 0000:00:1f.3: unbind modules mixin.1.1:0 -> mixout.2.1:0
[  112.923096] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx      : 0x46000002|0x3: MOD_UNBIND
[  112.923379] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx reply: 0x66000000|0x3: MOD_UNBIND
[  112.923395] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx done : 0x46000002|0x3: MOD_UNBIND
[  112.923403] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx      : 0x12000000|0x0: GLB_DELETE_PIPELINE
[  112.924189] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx reply: 0x3200000c|0x0: GLB_DELETE_PIPELINE
[  112.924197] sof-audio-pci-intel-tgl 0000:00:1f.3: FW reported error: 12 - Required resource is in invalid state
[  112.924222] sof-audio-pci-intel-tgl 0000:00:1f.3: ipc error for msg 0x12000000|0x0
[  112.924228] sof-audio-pci-intel-tgl 0000:00:1f.3: failed to free pipeline widget pipeline.1
...
[  112.925345] snd_sof:sof_widget_free_unlocked: sof-audio-pci-intel-tgl 0000:00:1f.3: widget pipeline.2 freed
[  112.925348] snd_sof:sof_widget_free_unlocked: sof-audio-pci-intel-tgl 0000:00:1f.3: widget dai-copier.HDA.Analog.playback freed
[  112.925353] sof-audio-pci-intel-tgl 0000:00:1f.3: Failed to free connected widgets
[  112.925359] sof-audio-pci-intel-tgl 0000:00:1f.3: sof_pcm_stream_free: sof_widget_list_free failed -22
[  112.925365] sof-audio-pci-intel-tgl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_hw_free on 0000:00:1f.3: -22

The only fix to get suspend while paused working is still #5058

marc-hb commented 4 months ago

but the suspend while pause is still broken

I don't think we test that in CI.

marc-hb commented 4 months ago

I don't think we test that in CI.

Actually, we have a test-case/check-pause-release-suspend-resume.sh in theory but I don't think it ever really worked in practice: https://github.com/thesofproject/sof-test/pull/931/commits/ceb197aa43e7a97a845f9ec462a211104a397ac9

@ssavati did you know about this test?

marc-hb commented 3 months ago

I think the main test case has been fixed? Focus has shifted to:

For sure the suspend/resume pass rate (outside LNL 5080) has never been so high for years.