Open ssavati opened 4 months ago
"pkill -9 aplay arecord" cmd not able to terminate as one of aplay/arecord instance is stuck/hang/blocked. So next iteration fails to start new instances playback/capture
if https://github.com/thesofproject/sof-test/pull/1213 is merged, testing with it should tell us whether any aplay / arecord processes are blocked in the kernel. As suggested by @lgirdwood enabling some kernel soft lockup debugging options could help too
I've now tried to reproduce this on both HDA and nocodec LNL setup. On HDA I only have 5 successful test runs, but on nocodec I just had 17 successful runs (about 7 hours) in a row. So the reproduction rate is very low. I still intend to catch up the nocodec numbers on HDA, but its starting to look like this issue does not show itself in reasonable time.
Now I have run 19 successful runs (about 8 hours) on LNL HDA RVP. So that case too does not appear to reproduce. @lgirdwood , @kv2019i , should we close this one?
BTW, both tests were run with relatively recent daily build.
Ack @jsarha . @ssavati ok to close or is this still seen in larger test plans? So either close or keep open as P3.
if https://github.com/thesofproject/sof-test/pull/1213 is merged, testing with it should tell us whether any aplay / arecord processes are blocked in the kernel.
This just happened in daily test run /44499?model=LNLM_RVP_NOCODEC&testcase=multiple-pipeline-all-50
[ 5141.981465] kernel: task:aplay state:R stack:10456 pid:60957 tgid:60957 ppid:60953 flags:0x00000006
[ 5141.981468] kernel: Call Trace:
[ 5141.981469] kernel:
[ 5141.981470] kernel: __schedule+0x311/0xa80
[ 5141.981473] kernel: schedule+0x3e/0x130
[ 5141.981474] kernel: schedule_timeout+0xb1/0x130
[ 5141.981476] kernel: ? __pfx_process_timeout+0x10/0x10
[ 5141.981480] kernel: __snd_pcm_lib_xfer+0x6ab/0x900 [snd_pcm]
[ 5141.981488] kernel: ? __pfx_interleaved_copy+0x10/0x10 [snd_pcm]
[ 5141.981492] kernel: ? __snd_pcm_lib_xfer+0x1a2/0x900 [snd_pcm]
[ 5141.981496] kernel: ? __pfx_default_write_copy+0x10/0x10 [snd_pcm]
[ 5141.981500] kernel: ? __pfx_default_wake_function+0x10/0x10
[ 5141.981503] kernel: snd_pcm_common_ioctl+0xf7a/0x1600 [snd_pcm]
[ 5141.981507] kernel: ? ioctl_has_perm.constprop.0.isra.0+0xba/0x110
[ 5141.981511] kernel: snd_pcm_ioctl+0x27/0x40 [snd_pcm]
[ 5141.981515] kernel: __x64_sys_ioctl+0x8e/0xd0
[ 5141.981517] kernel: do_syscall_64+0x9e/0x1a0
[ 5141.981519] kernel: entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 5141.981520] kernel: RIP: 0033:0x7f42f861794f
[ 5141.981521] kernel: RSP: 002b:00007ffe8f159ed0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 5141.981522] kernel: RAX: ffffffffffffffda RBX: 000055b4ddb26010 RCX: 00007f42f861794f
[ 5141.981523] kernel: RDX: 00007ffe8f159f30 RSI: 0000000040184150 RDI: 0000000000000004
[ 5141.981524] kernel: RBP: 000055b4ddb25f90 R08: 0000000000000000 R09: 000055b4ddb288c0
[ 5141.981524] kernel: R10: 000055b4ddb2e000 R11: 0000000000000246 R12: 0000000000000000
[ 5141.981525] kernel: R13: 000055b4ddb288c0 R14: 7fffffffffffffff R15: 0000000000000010
Multiple Pipeline Playback/capture t stress fails for HDA/NOCODEC/SDW. The issue reproduction is low but happens sporadically during CI stress test.
Steps HDA Test command TPLG=/lib/firmware/intel/sof-ipc4-tplg/sof-hda-generic-ace1-4ch.tplg MODEL=LNLM_RVP_HDA SOF_TEST_INTERVAL=5 ~/sof-test/test-case/multiple-pipeline.sh -f p -l 200
NOCDEC Tes tcommand TPLG=/lib/firmware/intel/development/sof-lnl-nocodec.tplg MODEL=LNLM_RVP_NOCODEC SOF_TEST_INTERVAL=5 ~/sof-test/test-case/multiple-pipeline.sh -f p -l 200
config Linux Branch: topic/sof-dev Linux Commit: c75a95d34885
SOF Branch: stable-v2.10 SOF Commit: b15f1f1a3238
Issue reproduction : sporadic and low