raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11k stars 4.95k forks source link

/dev/snd/timer hangs on poll when using alsa dmix #1888

Open aeruder opened 7 years ago

aeruder commented 7 years ago

As far as I can tell, this has been around for a while.

To recreate, it should be as simple as:

aplay -D plug:dmix /usr/share/sounds/alsa/Front_Left.wav

The interesting thing seen in the strace is it appears that the first poll on /dev/snd/timer works fine and the 2nd one hangs.

Host

Linux babombpi2 4.9.13-v7+ #974 SMP Wed Mar 1 20:09:48 GMT 2017 armv7l GNU/Linux

config.txt:

kernel=uboot.bin
dtparam=audio=on
dtoverlay=vc4-kms-v3d
device_tree_address=0x02000000
device_tree_end=0x02080000
gpu_mem=128
max_usb_current=1
avoid_warnings=2

config_hdmi_boost=4
hdmi_force_hotplug=1
hdmi_drive=2
hdmi_edid_file=1
hdmi_group=1
hdmi_mode=16
JamesH65 commented 7 years ago

Confirmed as definitely hanging on Linux dev2 4.9.24-v7+ #993 SMP Wed Apr 26 18:01:23 BST 2017 with a standard config.

pi@dev2:~/projects/gpio/libs $ aplay -D plug:dmix /usr/share/sounds/alsa/Front_Left.wav Playing WAVE '/usr/share/sounds/alsa/Front_Left.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Mono ^CAborted by signal Interrupt... aplay: pcm_write:1939: write error: Interrupted system call pi@dev2:~/projects/gpio/libs $

kali commented 6 years ago

Still there in Linux raspberrypi 4.9.52-v7+ #1038 SMP Fri Sep 29 16:26:52 BST 2017 armv7l GNU/Linux

kali commented 6 years ago

I can not reproduce the issue while working with the adafruit speaker bonnet, which uses hifiberry-dac instead of snd_bcm2835 .

hawkeye217 commented 6 years ago

Confirmed on Linux framepi 4.9.57+ #1045 Fri Oct 20 11:38:56 BST 2017 armv6l GNU/Linux

JamesH65 commented 6 years ago

Fairly old issue, but could anyone seeing the issue ensure they are running the latest kernel (currently 4.14.y) and confirm whether this issue is still present?

timothytylee commented 6 years ago

Problem still exists in kernel 4.14.52 (commit be09ec5fb2f4feb89f29e73f4d56fbc6aa8ce7da), running against rpi-firmware 2018-07-03 (commit 36ecc7f415d2593d975b6cfaed82a3a7cf344703)

begemoth0 commented 5 years ago

Stumbled on this one, still actual on

Linux .. 4.14.90+ #1183 Fri Dec 21 13:50:04 GMT 2018 armv6l GNU/Linux

JamesH65 commented 4 years ago

Still present on 4.19.71 (tested on Pi4)

JamesH65 commented 4 years ago

Been taking a brief look at this - the alsa code is fairly impenetrable so my suspicion is that it's going to take someone more expert in Alsa than me to figure out what the issue is. The code where it stalls is snd_timer_user_read in sound/core/timer.c. I've found a timer test app which does actually fail if run too quickly which might help, need to try it on an Ubuntu system to double check if the failure I see is expected.

lsgunth commented 4 years ago

I see a similar issue with pyaudio. While aplay (without dmix) works fine, the simple pyaudio examples hang on an infinite loop of poll calls to /dev/snd/timer and never play any audio. Here's a strace snippet:

openat(AT_FDCWD, "/dev/snd/timer", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 4
fcntl64(4, F_SETFD, FD_CLOEXEC)         = 0
ioctl(4, SNDRV_TIMER_IOCTL_PVERSION, 0xbee6d460) = 0
ioctl(4, SNDRV_TIMER_IOCTL_TREAD, 0xbee6d464) = 0
ioctl(4, SNDRV_TIMER_IOCTL_SELECT, 0xbee6d468) = 0
ioctl(4, SNDRV_TIMER_IOCTL_PVERSION, 0xbee6d808) = 0
semop(65536, [{0, -1, SEM_UNDO|IPC_NOWAIT}], 1) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(4, SNDRV_TIMER_IOCTL_PARAMS, 0xbee6d5f4) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(4, SNDRV_TIMER_IOCTL_PARAMS, 0xbee6dd3c) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(4, SNDRV_TIMER_IOCTL_START, 0x5420) = 0
semop(65536, [{0, 0, 0}, {0, 1, SEM_UNDO}], 2) = 0
semop(65536, [{0, -1, SEM_UNDO|IPC_NOWAIT}], 1) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=536, tv_nsec=980477427}) = 0
read(3, "\341\377\262\377P\0x\0\217\377\314\377\262\0\265\377\320\376`\0R\1v\377l\376M\0\244\1S\0"..., 4096) = 4096
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
semop(65536, [{0, 0, 0}, {0, 1, SEM_UNDO}], 2) = 0
semop(65536, [{0, -1, SEM_UNDO|IPC_NOWAIT}], 1) = 0
read(4, "\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 64) = 16
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
poll([{fd=4, events=POLLIN|POLLERR|POLLNVAL}], 1, 22) = 0 (Timeout)
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
poll([{fd=4, events=POLLIN|POLLERR|POLLNVAL}], 1, 22) = 0 (Timeout)
nanosleep({tv_sec=0, tv_nsec=1000000}, 0xbee6df44) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
poll([{fd=4, events=POLLIN|POLLERR|POLLNVAL}], 1, 22) = 0 (Timeout)
nanosleep({tv_sec=0, tv_nsec=1000000}, 0xbee6df44) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
poll([{fd=4, events=POLLIN|POLLERR|POLLNVAL}], 1, 22) = 0 (Timeout)
nanosleep({tv_sec=0, tv_nsec=1000000}, 0xbee6df44) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
poll([{fd=4, events=POLLIN|POLLERR|POLLNVAL}], 1, 22) = 0 (Timeout)
nanosleep({tv_sec=0, tv_nsec=1000000}, 0xbee6df44) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
poll([{fd=4, events=POLLIN|POLLERR|POLLNVAL}], 1, 22) = 0 (Timeout)
nanosleep({tv_sec=0, tv_nsec=1000000}, 0xbee6df44) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
poll([{fd=4, events=POLLIN|POLLERR|POLLNVAL}], 1, 22) = 0 (Timeout)
nanosleep({tv_sec=0, tv_nsec=1000000}, 0xbee6df44) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
poll([{fd=4, events=POLLIN|POLLERR|POLLNVAL}], 1, 22) = 0 (Timeout)
nanosleep({tv_sec=0, tv_nsec=1000000}, 0xbee6df44) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
poll([{fd=4, events=POLLIN|POLLERR|POLLNVAL}], 1, 22) = 0 (Timeout)
nanosleep({tv_sec=0, tv_nsec=1000000}, 0xbee6df44) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
poll([{fd=4, events=POLLIN|POLLERR|POLLNVAL}], 1, 22) = 0 (Timeout)
nanosleep({tv_sec=0, tv_nsec=1000000}, 0xbee6df44) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
poll([{fd=4, events=POLLIN|POLLERR|POLLNVAL}], 1, 22) = 0 (Timeout)
nanosleep({tv_sec=0, tv_nsec=1000000}, 0xbee6df44) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
poll([{fd=4, events=POLLIN|POLLERR|POLLNVAL}], 1, 22) = 0 (Timeout)
nanosleep({tv_sec=0, tv_nsec=1000000}, 0xbee6df44) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
poll([{fd=4, events=POLLIN|POLLERR|POLLNVAL}], 1, 22) = 0 (Timeout)
nanosleep({tv_sec=0, tv_nsec=1000000}, 0xbee6df44) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0
poll([{fd=4, events=POLLIN|POLLERR|POLLNVAL}], 1, 22) = 0 (Timeout)
nanosleep({tv_sec=0, tv_nsec=1000000}, 0xbee6df44) = 0
ioctl(5, SNDRV_PCM_IOCTL_SYNC_PTR, 0x1d2b360) = 0

This is on a Pi4 with 4.19.118-v7l

lsgunth commented 4 years ago

On my read of the code, this appears to be related to the PCM timer. Per kconfig:

config SND_PCM_TIMER
        bool "PCM timer interface" if EXPERT
        default y
        help
          If you disable this option, pcm timer will be unavailable, so
          those stubs that use pcm timer (e.g. dmix, dsnoop & co) may work
          incorrectlly.

The driver is meant to call snd_pcm_period_elapsed() at regular intervals in an IRQ which will wake-up the userspace processes waiting on the PCM timer. I suspect the bcm2835 driver is doing this incorrectly. My guess is the driver is only calling the period elapsed function after there's data but userspace expects to get the timer events before it sends data. See the comments in bcm2835_playback_fifo()

pelwell commented 4 years ago

I think you are mistaken about snd_pcm_period_elapsed. The comment above the function says:

 * snd_pcm_period_elapsed - update the pcm status for the next period
 * @substream: the pcm substream instance
 *
 * This function is called from the interrupt handler when the
 * PCM has processed the period size.  It will update the current
 * pointer, wake up sleepers, etc.
...

Note that it says "when the PCM has processed the period size", not "when a period of time has elapsed".

And the kernel document "Writing an ALSA driver" says:

Interrupts at the period (fragment) boundary
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is the most frequently found type: the hardware generates an
interrupt at each period boundary. In this case, you can call
snd_pcm_period_elapsed() at each interrupt.

and

High frequency timer interrupts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This happens when the hardware doesn't generate interrupts at the period
boundary but issues timer interrupts at a fixed timer rate (e.g. es1968
or ymfpci drivers). In this case, you need to check the current hardware
position and accumulate the processed sample length at each interrupt.
When the accumulated size exceeds the period size, call
snd_pcm_period_elapsed() and reset the accumulator.

Both of these cases - hardware interrupt and HR timer callback - call snd_pcm_period_elapsed when the audio for a certain period has been played. If there is no audio then there will be no playback and no callbacks.

Some audio mixing frameworks play silence even when there are no clients actively generating audio. This would result in the kind of continuous callbacks you refer to, but that is not the way ALSA behaves natively - it handles a single client, starting playback when there is some data and stopping it at the end.

lsgunth commented 4 years ago

Well, regardless of the documentation (which may be wrong and/or confusing), the fact of the matter is snd_pcm_period_elapsed() is what will trigger the timer interrupt and allow userspace to continue. It is not getting called at the appropriate time (whatever that may be) and userspace is hanging because of it.

snd_pcm_period_elapsed() calls snd_timer_interrupt() which calls snd_timer_user_interrupt() (via the callback) and that function increments qused and wakes up qchange_sleep which is what will allow the userspace process to continue.