thesofproject / sof

Sound Open Firmware
Other
541 stars 309 forks source link

Laptop audio dies after a while (Dell G3 3590) #5134

Closed Ninjastorm380 closed 1 year ago

Ninjastorm380 commented 2 years ago

hello! recently I had set up my old dell laptop as a stand in for my main laptop (which is down for repair work), and while I was gaming (proton experimental latest, wine staging 7), audio would stop working altogether. I originally did not know the cause, but compiling my favorite performance kernel (xanmod-rog) improved it a little. I have provided a log up to the moment after audio loss. I think the most relevant (and where I think you guys step in) block is this:

[ +0.001478] audit: type=1105 audit(1640474989.148:115): pid=4606 uid=1000 auid=1000 ses=4 subj==unconfined msg='op=PAM:session_open grantors=pam_systemd_home,pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success [ +5.926301] audit: type=1334 audit(1640474995.076:116): prog-id=21 op=LOAD [ +0.000076] audit: type=1334 audit(1640474995.076:117): prog-id=22 op=LOAD [ +0.000043] audit: type=1334 audit(1640474995.076:118): prog-id=23 op=LOAD [ +0.042852] audit: type=1130 audit(1640474995.118:119): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success [Dec25 18:30] audit: type=1131 audit(1640475025.166:120): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success [ +0.047013] audit: type=1334 audit(1640475025.216:121): prog-id=0 op=UNLOAD [ +0.000004] audit: type=1334 audit(1640475025.216:122): prog-id=0 op=UNLOAD [ +0.000001] audit: type=1334 audit(1640475025.216:123): prog-id=0 op=UNLOAD [Dec25 18:58] sof-audio-pci-intel-cnl 0000:00:1f.3: error: reply expected 12 got 0 bytes [ +0.000013] sof-audio-pci-intel-cnl 0000:00:1f.3: error: ipc error for 0x80010000 size 12 [ +0.000002] sof-audio-pci-intel-cnl 0000:00:1f.3: error: failed to set dai config for Analog Playback and Capture [ +0.000001] sof-audio-pci-intel-cnl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -22 [ +0.000002] HDA Analog: ASoC: dpcm_be_dai_trigger() failed at Analog Playback and Capture (-22) [ +0.000003] HDA Analog: ASoC: trigger FE cmd: 0 failed: -22 [ +0.000612] sof-audio-pci-intel-cnl 0000:00:1f.3: error: reply expected 20 got 0 bytes [ +0.000028] sof-audio-pci-intel-cnl 0000:00:1f.3: error: ipc error for 0x60010000 size 20 [ +0.000002] sof-audio-pci-intel-cnl 0000:00:1f.3: error: hw params ipc failed for stream 1 [ +0.000001] sof-audio-pci-intel-cnl 0000:00:1f.3: error: set pcm hw_params after resume [ +0.000001] sof-audio-pci-intel-cnl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_prepare on 0000:00:1f.3: -22 [ +0.000001] HDA Analog: ASoC: soc_pcm_prepare() failed (-22) [ +0.000002] HDA Analog: ASoC: dpcm_fe_dai_prepare() failed (-22)

this bit occurs every time audio dies, and it never recovers. sometimes I get a full system lockup shortly afterward (which I deliberately avoided this time by a reboot)

full dmesg log (just in case): debug.log

EDIT: I should probably clarify that this happens on both native titles and non-native, as well as other applications (discord, firefox, VLC, etc..)

EDIT 2: forgot a very important bit of info: Distro: EndeavourOS (Arch Rolling Release) dGPU driver: nvidia 495 (latest as of this time) Integrated driver: intel open source latest Desktop: Gnome 41 latest

lgirdwood commented 2 years ago

@Ninjastorm380 thanks, are you saying this error happens every time the system load is high or does it happen when you are playing audio for a long time ? Trying to figure out what the use case is that causes the issue.

Ninjastorm380 commented 2 years ago

@lgirdwood on stock arch kernel, it can happen pretty much out of the blue (IE: just running discord), but with a clanged xanmod-rog kernel, it only readily occurs while running a intensive game (warframe is a good free working choice for that part) while playing music from another source (I was running newgrounds audio from firefox when this happened) while running OBS studio (I was prepping for stream, so I was running some test recordings to make sure NVENC was ready to do work).

Ninjastorm380 commented 2 years ago

thus, I suspect something is not keeping up while under load (buffer overrun/underrun?), as the crash frequency goes down with a newer optimized kernel.

stock arch kernel is currently linux 5.15.11 xanmod rog kernel is currently linux-xanmod-rog 5.15.10, compiled with clang march native + lto + O3, for an intel core i5 9th gen mobile cpu

Ninjastorm380 commented 2 years ago

my use case is streaming to twitch (average of ~3 hours), but it's failing quite frequently, thus preventing me from doing my job.

it tends to happen on both of your examples, honestly.

Ninjastorm380 commented 2 years ago

just saw that there was an update to sof-firmware (is now sof-firmware-2.0-1). will test stability in the morning.

Ninjastorm380 commented 2 years ago

@lgirdwood finished testing. same issue on newest release. did a whole recording of it happening + another dmesg log. strangely enough, nothing talking about asoc this time? dmesg.log Audio Crash Recording (google drive link)

Issue can bee seen at timestamp ~16:30

lgirdwood commented 2 years ago

@Ninjastorm380 can you confirm all the audio endpoints you have been using before the crash. I can see analog HDA realtek PCMs and also DP/HDMI PCMs. I am suspicious we have a DSP DP/HDMI interaction with the Intel GPU (when it's disabled as Nvidia is used ?) or incorrectly with the Nvidia driver via HDMI audio APIs ? (possibly by pulseaudio or pipewire probing all PCMs). @plbossart @kv2019i any comments here ?

Ninjastorm380 commented 2 years ago

last test was just laptop display only, one before was with my external display connected. both had the same audio failure. I know laptop display is LVDS/DP and that there is another DP on the usb-c port, and that there is a HDMI port on the left hand side.

Ninjastorm380 commented 2 years ago

as noted in my video, the integrated mic volume can't be adjusted without a lockup. could it be something interacting with the DMIC array?

Ninjastorm380 commented 2 years ago

@lgirdwood

Ninjastorm380 commented 2 years ago

audio endpoints: 3.5mm audio jack (trrs) headset is all that I was using for audio, although the hdmi port was connected to my second monitor on the older log.

Ninjastorm380 commented 2 years ago

dmesg.log

another failure, and it's got entries pointing to asoc again. was using my headset, had firefox playing music, discord was doing voice calls, and was playing warframe

lgirdwood commented 2 years ago
[42525.862435] audit: type=1334 audit(1640691247.139:117): prog-id=0 op=UNLOAD
[78763.784733] sof-audio-pci-intel-cnl 0000:00:1f.3: error: reply expected 20 got 0 bytes
[78763.784774] sof-audio-pci-intel-cnl 0000:00:1f.3: error: ipc error for 0x60010000 size 20
[78763.784778] sof-audio-pci-intel-cnl 0000:00:1f.3: error: hw params ipc failed for stream 1
[78763.784780] sof-audio-pci-intel-cnl 0000:00:1f.3: error: set pcm hw_params after resume
[78763.784782] sof-audio-pci-intel-cnl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_prepare on 0000:00:1f.3: -22
[78763.784784]  HDA Analog: ASoC: soc_pcm_prepare() failed (-22)
[78763.784787]  HDA Analog: ASoC: dpcm_fe_dai_prepare() failed (-22)

This is really a FW error as the IPC request fails for hw_params.

was using my headset, had firefox playing music, discord was doing voice calls, and was playing warframe

Can you confirm you are using a sound server like pipewire or pulseaudio as I would expect the sound server to mix the streams and keep the PCM open but the log above shows what looks like a new PCM being opened (do you know which of the applications is opening the last PCM). Need to figure out the usage so we can duplicate.

Ninjastorm380 commented 2 years ago

@lgirdwood pipewire is the backend, with pipewire-alsa, pipewire-jack, and pipewire-pulse for compat with WINE

Ninjastorm380 commented 2 years ago

as for what application opened the new stream? dunno, as this happens without anything new being opened (the most recent log was with playing warframe for about 15-30 minutes straight in that scenario. nothing new opened in that timeframe.)

could it be possible that pipewire is crashing and attempting to reinit on respawn, thus causing this FW crash?

Ninjastorm380 commented 2 years ago

testing on fedora 35 to see if this issue happens on it as well.

Ninjastorm380 commented 2 years ago

@lgirdwood yep audio cut out, and I think I know why: a couple of seconds before audio died, it started cutting up a little bit, as though something was struggling (reminded me of a buffer underrun in pulse). could the FW/Pipewire be crashing because of that? on another note, it seems to only crash persistently with easyeffects, pavu, warframe, discord, OBS (tytan build), and my custom sound stream isolation script (which just adds some virtual pipewire streams that I can use OBS to listen off of. I use pavu to switch apps to them).

attaching various logs from my system, as well as my script. dmesg.log journalctl.log StreamIsolation.txt rename extension to .sh, github had me rename it.....

Ninjastorm380 commented 2 years ago

at the end of the newest dmesg.log, these show up at about the same time as the audio artifacts, right before it just died. [Dec30 10:20] perf: interrupt took too long (2505 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 [Dec30 10:25] perf: interrupt took too long (3138 > 3131), lowering kernel.perf_event_max_sample_rate to 63000 IPC = interrupt, right?

lgirdwood commented 2 years ago

at the end of the newest dmesg.log, these show up at about the same time as the audio artifacts, right before it just died. [Dec30 10:20] perf: interrupt took too long (2505 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 [Dec30 10:25] perf: interrupt took too long (3138 > 3131), lowering kernel.perf_event_max_sample_rate to 63000 IPC = interrupt, right?

Could be,that or another interrupt. This usually means the system is heavily loaded. Is there any information from the PW logs. Is it possible to enable and attach the PW logs here.

Ninjastorm380 commented 2 years ago

Couldn't figure out how to get pipewire to do logging. if you have a link to some instructions, I would love to read them.

lgirdwood commented 2 years ago

@Ninjastorm380 t looks like pipewire has a top debug command that can be used to detect what is causing any xrun (and possibly leading to the other failures). See the bottom text here https://gitlab.freedesktop.org/pipewire/pipewire/-/wikis/Troubleshooting

Ninjastorm380 commented 2 years ago

@lgirdwood dmesg.log pw-top.log

I couldn't link journalctl.log, as github kept throwing an error: 'is not included in the list'. Also, the pw-top output is a fair bit messy, so you might have to parse for the values you want (doesn't appear to be meant for human eyes).

lgirdwood commented 2 years ago

@lgirdwood dmesg.log pw-top.log

I couldn't link journalctl.log, as github kept throwing an error: 'is not included in the list'. Also, the pw-top output is a fair bit messy, so you might have to parse for the values you want (doesn't appear to be meant for human eyes).

Ack, it looks likes it output from an terminal format and codes. The dmesg though has more info

[ 2034.784984] perf: interrupt took too long (2610 > 2500), lowering kernel.perf_event_max_sample_rate to 76000
[ 2121.091892] sof-audio-pci-intel-cnl 0000:00:1f.3: error: reply expected 12 got 0 bytes
[ 2121.091924] sof-audio-pci-intel-cnl 0000:00:1f.3: error: ipc error for 0x60030000 size 12
[ 2121.091932] sof-audio-pci-intel-cnl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -22
[ 2121.091937]  HDA Analog: ASoC: trigger FE cmd: 0 failed: -22
[ 2121.093059] sof-audio-pci-intel-cnl 0000:00:1f.3: error: reply expected 12 got 0 bytes
[ 2121.093113] sof-audio-pci-intel-cnl 0000:00:1f.3: error: ipc error for 0x60040000 size 12
[ 2121.093120] sof-audio-pci-intel-cnl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -22
[ 2121.093127]  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22
[ 2121.093229] sof-audio-pci-intel-cnl 0000:00:1f.3: error: reply expected 12 got 0 bytes
[ 2121.093250] sof-audio-pci-intel-cnl 0000:00:1f.3: error: ipc error for 0x60050000 size 12
[ 2121.093764] sof-audio-pci-intel-cnl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -22
[ 2121.093769]  HDA Analog: ASoC: trigger FE cmd: 0 failed: -22
[ 2121.095045] sof-audio-pci-intel-cnl 0000:00:1f.3: error: reply expected 12 got 0 bytes
[ 2121.095067] sof-audio-pci-intel-cnl 0000:00:1f.3: error: ipc error for 0x60040000 size 12
[ 2121.095071] sof-audio-pci-intel-cnl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -22
[ 2121.095076]  HDA Analog: ASoC: trigger FE cmd: 1 failed: -22
[ 2121.095168] sof-audio-pci-intel-cnl 0000:00:1f.3: error: reply expected 12 got 0 bytes
[ 2121.095183] sof-audio-pci-intel-cnl 0000:00:1f.3: error: ipc error for 0x60050000 size 12
[ 2121.095701] sof-audio-pci-intel-cnl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -22
[ 2121.095704]  HDA Analog: ASoC: trigger FE cmd: 0 failed: -22

For context the the commands being sent are

#define SOF_IPC_STREAM_PCM_FREE         SOF_CMD_TYPE(0x003)
#define SOF_IPC_STREAM_TRIG_START       SOF_CMD_TYPE(0x004)
#define SOF_IPC_STREAM_TRIG_STOP        SOF_CMD_TYPE(0x005)

So we fail to free the PCM and restart it. @Ninjastorm380 it looks like we may need the full kernel logs. @ujfalusi can we enable full IPC logs in the driver using a kernel command line ?

Ninjastorm380 commented 2 years ago

cat /proc/sys/kernel/printk 7 4 1 7 kernel loglevel is already at max (I think), unless you guys aren't specifying your loglevel on your messages.

ujfalusi commented 2 years ago

@Ninjastorm380, in order to see the full logs, you need to enable the dynamic debugging, please see: https://thesofproject.github.io/latest/getting_started/intel_debug/suggestions.html#enable-dynamic-debug.

What is interesting in the errors that we did received a reply for the message but it came back as size 0. Prior messages were correct as nothing has been reported. We can confirm that with the full logs.

Ninjastorm380 commented 2 years ago

Possibly because I was running EndeavourOS for the first few logs, and Fedora 35 for the latest

Ninjastorm380 commented 2 years ago

@lgirdwood all the options have been set, restarting and performing test once more.

Ninjastorm380 commented 2 years ago

@lgirdwood here is the dmesg log dmesg.log

it was very short this time, it skipped cutting out the system audio and went to a full system lockup... will run another test to see if we get different results

Ninjastorm380 commented 2 years ago

dmesg.log another log. this time the audio started to underrun pretty hard before the system locked up again

ujfalusi commented 2 years ago

@Ninjastorm380, from the log it is clear that the playback stream is constantly stopped and started which indicates the underflow. In the last log the playback starts at line 2465, the capture starts at 2489 and the playback immediately got stopped at line 2493. From there on, the playback stream is constantly in underrun and get stopped/started.

You do not have this issue without audio capture running at the same time? Would the underrun starts if you start playback (I assume it works fine) and later start capture (I would expect that this would trigger the playback underrun)?

Can you attach the report from alsa-info.sh as well?

lgirdwood commented 2 years ago

Yeah, it looks like we repeat stream start, stop, reset, dai config forever.

[  767.144773] sof-audio-pci-intel-cnl 0000:00:1f.3: In hda_link_pcm_trigger cmd=0
[  767.144781] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x80010000
[  767.144967] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x80010000
[  767.144991] sof-audio-pci-intel-cnl 0000:00:1f.3: hda: prepare stream dir 0
[  767.144996] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x80010000
[  767.145776] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x80010000
[  767.145794] sof-audio-pci-intel-cnl 0000:00:1f.3: format_val=49, rate=48000, ch=2, format=10
[  767.145826] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: prepare stream 0 dir 0
[  767.145827] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: hw params stream 0 dir 0
[  767.145829] sof-audio-pci-intel-cnl 0000:00:1f.3: generating page table for 000000002803e40a size 0x10000 pages 16
[  767.145838] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x140000 successful
[  767.145858] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x40000 successful
[  767.145861] sof-audio-pci-intel-cnl 0000:00:1f.3: period_bytes:0x2000
[  767.145862] sof-audio-pci-intel-cnl 0000:00:1f.3: periods:8
[  767.145880] sof-audio-pci-intel-cnl 0000:00:1f.3: stream_tag 1
[  767.145883] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60010000
[  767.146809] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x60010000
[  767.146814] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: stream dir 0, posn mailbox offset is 790528
[  767.146860] sof-audio-pci-intel-cnl 0000:00:1f.3: In hda_link_pcm_trigger cmd=1
[  767.146873] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: trigger stream 0 dir 0 cmd 1
[  767.147410] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x2014001e successful
[  767.147418] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60040000
[  767.148724] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x60040000
[  767.317666] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: trigger stream 0 dir 0 cmd 0
[  767.317676] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60050000
[  767.317837] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x60050000
[  767.318374] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x20140000 successful
[  767.318387] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60030000
[  767.318792] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x60030000
[  767.318802] sof-audio-pci-intel-cnl 0000:00:1f.3: In hda_link_pcm_trigger cmd=0
[  767.318811] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x80010000
[  767.318994] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x80010000
[  767.319034] sof-audio-pci-intel-cnl 0000:00:1f.3: hda: prepare stream dir 0
[  767.319042] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x80010000
[  767.319824] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x80010000
[  767.319852] sof-audio-pci-intel-cnl 0000:00:1f.3: format_val=49, rate=48000, ch=2, format=10
[  767.320033] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: prepare stream 0 dir 0
[  767.320037] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: hw params stream 0 dir 0
[  767.320040] sof-audio-pci-intel-cnl 0000:00:1f.3: generating page table for 000000002803e40a size 0x10000 pages 16
[  767.320064] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x140000 successful
[  767.320085] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x40000 successful
[  767.320089] sof-audio-pci-intel-cnl 0000:00:1f.3: period_bytes:0x2000
[  767.320091] sof-audio-pci-intel-cnl 0000:00:1f.3: periods:8

@ujfalusi btw, do you know why we send DAI config IPC twice ? @Ninjastorm380 what userspace application is starting audio at 449 seconds ? Are you able to provide it's context or cmd line.

ujfalusi commented 2 years ago

@lgirdwood, the dai config messages are part of the correct sequencing implementation of the HDA DMA start and stop.

Ninjastorm380 commented 2 years ago

I think 449 was when my isolation script was executed, which opens 3 virtual streams. I use those streams to monitor individual application audio outputs. at least, that was the first thing I started

Ninjastorm380 commented 2 years ago

@Ninjastorm380, from the log it is clear that the playback stream is constantly stopped and started which indicates the underflow. In the last log the playback starts at line 2465, the capture starts at 2489 and the playback immediately got stopped at line 2493. From there on, the playback stream is constantly in underrun and get stopped/started.

You do not have this issue without audio capture running at the same time? Would the underrun starts if you start playback (I assume it works fine) and later start capture (I would expect that this would trigger the playback underrun)?

Can you attach the report from alsa-info.sh as well?

@ujfalusi alsa-info.txt before executing my isolation script, and just playing audio from firefox alsa-info.txt after running my Isolation script, still just playing audio in firefox. firefox has automatically switched over to using 'application-capture-output'.

Ninjastorm380 commented 2 years ago

running with just firefox does not trigger the issue. firefox + script does not trigger either. This only triggers while under moderate to heavy stress (warframe @ max GFX settings on 'application-capture-output' + OBS + StreamISolation.sh + Discord on 'discord-capture-output' + firefox on 'application-capture-output' + Pavucontrol + VLC on 'vlc-capture-output', which results in an average all core use of 39% - 53% utilization.)

Ninjastorm380 commented 2 years ago

warframe + script + firefox does not trigger, but as soon as I launched OBS on top of those, I got a full system lockup again. dmesg.log

Ninjastorm380 commented 2 years ago

I just noticed, but after a reboot, my power light started blinking while in use. Looking up the codes for this laptop I got this: 2 amber flashes and 8 white flashes = system board error is it possible that this could be related?

Ninjastorm380 commented 2 years ago

and on a related note, I get a usb sound card in the mail today. is it possible to disable the SoF driver so that I am just using the external card?

lgirdwood commented 2 years ago

Does OBS on it's own cause this issue ? Just trying to figure what is doing all the stream operations and why ? We have similar stress tests but OBS may be doing something different which we need to figure and out try and reporduce.

A system board error could hang your device and could be contributing here , e.g. if load is high then the board may heat up too much and lock. Yep, USB audio can be used on it's own - just have pipewire use this as the default card.

Ninjastorm380 commented 2 years ago

Does OBS on it's own cause this issue ? Just trying to figure what is doing all the stream operations and why ? We have similar stress tests but OBS may be doing something different which we need to figure and out try and reporduce.

A system board error could hang your device and could be contributing here , e.g. if load is high then the board may heat up too much and lock. Yep, USB audio can be used on it's own - just have pipewire use this as the default card.

@lgirdwood I forgot something in the puzzle! easyeffects is set up in my environment (to give noise reduction to the headset I use with this.). testing with OBS + script + firefox in a fresh EndeavourOS install results in no hang and no issues. once warframe gets installed, i'll have the 'heavy load' part of the equation to try. I'll install easyeffects again, but it will be set up for manual start only so it won't interfere.

tldr: it's very possible it could be easyeffects + script + OBS + burdened CPU.

Ninjastorm380 commented 2 years ago

@lgirdwood had OBS + Firefox + script running, and nothing else. audio has become useless, but is not gone. audio is monotonous and erratic in duration, almost like cellphone interference, and only happens while audio is supposed to be playing. I suspect the DSP tripped and fell over this time. here's the log. dmesg.log

Ninjastorm380 commented 2 years ago

scrach that, eventual system lockup, as per usual.

Ninjastorm380 commented 2 years ago

will run that test again

Ninjastorm380 commented 2 years ago

[ 2078.747257] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60050000 [ 2078.747907] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x60050000 [ 2078.748431] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x20140000 successful [ 2078.748438] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60030000 [ 2078.748602] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x60030000 [ 2078.748607] sof-audio-pci-intel-cnl 0000:00:1f.3: In hda_link_pcm_trigger cmd=0 [ 2078.748614] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x80010000 [ 2078.748791] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x80010000 [ 2078.748823] sof-audio-pci-intel-cnl 0000:00:1f.3: hda: prepare stream dir 0 [ 2078.748829] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x80010000 [ 2078.749190] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x80010000 [ 2078.749213] sof-audio-pci-intel-cnl 0000:00:1f.3: format_val=49, rate=48000, ch=2, format=10 [ 2078.749245] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: prepare stream 0 dir 0 [ 2078.749247] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: hw params stream 0 dir 0 [ 2078.749249] sof-audio-pci-intel-cnl 0000:00:1f.3: generating page table for 000000003e1dcb8e size 0x10000 pages 16 [ 2078.749258] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x140000 successful [ 2078.749277] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x40000 successful [ 2078.749280] sof-audio-pci-intel-cnl 0000:00:1f.3: period_bytes:0x2000 [ 2078.749281] sof-audio-pci-intel-cnl 0000:00:1f.3: periods:8 [ 2078.749297] sof-audio-pci-intel-cnl 0000:00:1f.3: stream_tag 1 [ 2078.749301] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60010000 [ 2078.749831] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x60010000 [ 2078.749834] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: stream dir 0, posn mailbox offset is 790528 [ 2078.750140] sof-audio-pci-intel-cnl 0000:00:1f.3: In hda_link_pcm_trigger cmd=1 [ 2078.750145] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: trigger stream 0 dir 0 cmd 1 [ 2078.751123] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x2014001e successful [ 2078.751129] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60040000 [ 2078.752198] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x60040000 watching and logging dmesg like a hawk. this block appeared at the exact time as a single blip in my audio stream. currently only running firefox + script + obs

Ninjastorm380 commented 2 years ago

so far, it appears to not readily crash while under light to no load. tested multiple times with firefox + script + obs. only one crash out of 15 attempts. adding steam + warframe to the mix for heavy load scenario.

Ninjastorm380 commented 2 years ago

no crash yet, but with steam pre-compiling shaders (which is putting the cores at pretty high averages), i'm seeing multiple of those IPC blocks again.

[ 7241.190658] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: trigger stream 0 dir 0 cmd 0
[ 7241.190664] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60050000
[ 7241.191266] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x60050000
[ 7241.191781] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x20140000 successful
[ 7241.191786] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60030000
[ 7241.191953] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x60030000
[ 7241.191961] sof-audio-pci-intel-cnl 0000:00:1f.3: In hda_link_pcm_trigger cmd=0
[ 7241.191969] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x80010000
[ 7241.192306] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x80010000
[ 7241.192332] sof-audio-pci-intel-cnl 0000:00:1f.3: hda: prepare stream dir 0
[ 7241.192339] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x80010000
[ 7241.192533] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x80010000
[ 7241.192552] sof-audio-pci-intel-cnl 0000:00:1f.3: format_val=49, rate=48000, ch=2, format=10
[ 7241.192591] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: prepare stream 0 dir 0
[ 7241.192593] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: hw params stream 0 dir 0
[ 7241.192594] sof-audio-pci-intel-cnl 0000:00:1f.3: generating page table for 000000003e1dcb8e size 0x10000 pages 16
[ 7241.192603] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x140000 successful
[ 7241.192624] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x40000 successful
[ 7241.192628] sof-audio-pci-intel-cnl 0000:00:1f.3: period_bytes:0x2000
[ 7241.192630] sof-audio-pci-intel-cnl 0000:00:1f.3: periods:8
[ 7241.192648] sof-audio-pci-intel-cnl 0000:00:1f.3: stream_tag 1
[ 7241.192651] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60010000
[ 7241.192978] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x60010000
[ 7241.192982] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: stream dir 0, posn mailbox offset is 790528
[ 7241.193012] sof-audio-pci-intel-cnl 0000:00:1f.3: In hda_link_pcm_trigger cmd=1
[ 7241.193017] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: trigger stream 0 dir 0 cmd 1
[ 7241.193531] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x2014001e successful
[ 7241.193534] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60040000
[ 7241.194282] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x60040000

multiple of these blocks are found, each identical in every way except the time it occurred. I count about 9 blocks, but so far no failure.

Ninjastorm380 commented 2 years ago

question: how would you disable powersave features for the SoF driver? I think I have had a bit of insight, and want to try something.

Ninjastorm380 commented 2 years ago

@lgirdwood just finished a 2+ hour test. no crash at all with pipewire node powersave disabled and intel hda powersave disabled. so bad powersave interactions to blame?

dmesg: dmesg.log

Ninjastorm380 commented 2 years ago

@lgirdwood seems like I have indeed fixed the issue, however, there is still a system lockup after a while (greater than 4 hours runtime at high load). I suspect that to be a nvidia/wayland issue.

it seems pretty clear that powersave options/events were causing the audio cutout, does this narrow that down for you guys?

ujfalusi commented 2 years ago

@Ninjastorm380, how did you disabled the powersave(s)? In the dmesg log I can see this:

[    3.030533] snd_sof_pci_intel_cnl: unknown parameter 'power_save' ignored

The power saving for the audio should only kick in (in driver level) when you don't have active stream as we don't yet have deep buffer support and that would increase the latency to gain on power, which is not good for your use. The frequency scaling of the DSP however can trigger scheduling issues resulting of underflow for sure.

During audio activity the driver on the Linux side is not doing much, it just handles the period elapsed (or queries the position), a CPU load can delay these reports a bit, but it is unlikely that it is going to cause such a flood of underflows (famous last words ;) ). All audio processing is done within the DSP, but I'm not sure how Optimus and HDMI audio is routed.

You have mentioned at some point that touching DMIC gain would lock things up. What is the audio output and input you are using (which PCM)? If you have a headset, can you try using that as input instead of DMIC and see if it makes any difference? Handling of DMIC is different than other audio components in the firmware, it involves different components. With the USB audio card you are not seeing any issue, right?

Does OBS opens a new PCM in addition what has been opened already (more streams == more cycles needed in DSP side)?