ALSA backend buffers too much data for output streams, too little for duplex streams

nyanpasu64 commented 2 years ago

Full writeup at https://gist.github.com/nyanpasu64/bfcaf6b28fefdf791e6213b737d49616.

My assumption is that RtAudio is designed to provide low-latency (no excess buffering) and glitch-free audio input and output. Here are some problems in RtApiAlsa's operation that prevents the goal from being achieved:

Minimum achievable input/output/duplex latency

The minimum achievable audio latency at a given period size is achieved by having 2 periods of total capture/playback buffering between hardware and a app (RtApiAlsa, JACK2, or PipeWire).

If an audio daemon mixes audio from multiple apps, it can only avoid adding latency if there is no buffering (but instead synchronous execution) between the daemon and apps. JACK2 in synchronous mode and PipeWire support this, but pipewire-alsa fails this test by default, so ALSA is not a zero-latency way of talking to PipeWire.

For duplex streams, the total round-trip (microphone-to-speaker) latency of a duplex stream is N periods.

For capture and duplex streams, there are 0 to 1 periods of capture (microphone-to-screen) latency (since microphone input can occur at any time, but is always processed at period boundaries).

For playback and duplex streams, there are N-1 to N periods of playback (keyboard-to-speaker) latency (since keyboard input can occur at any point, but is always converted into audio at period boundaries).

These values only include delay caused by audio buffers, and exclude extra latency in the input stack, display stack, sound drivers, resamplers, or ADC/DAC.

Avoid blocking writes (output only) (RtAudio has added latency)

If your app generates one output period of audio at a time and you want to minimize keypress-to-audio latency, regardless if your app outputs to hardware devices or pull-mode daemons, it should never rely on blocking writes to act as output backpressure. Instead it should wait until 1 period of audio is writable, then generate 1 period of audio and nonblocking-write it. (This does not apply to duplex apps, since waiting for available input data effectively acts as output throttling.)

If your app generates audio before performing blocking writes for throttling, you will generate a new period of audio as soon as the previous period of audio is written (a full period of real time before a new period of audio is writable). This audio gets buffered for an extra period (while snd_pcm_writei() blocks) before reaching the speakers, so external (eg. keyboard) input takes a period longer to be audible.

(Note that avoiding blocking writes isn't necessarily beneficial if you don't generate audio in chunks synchronized with output periods.)

Issue: RtAudio relies on blocking snd_pcm_writei in pure-output streams. This adds 1 period of keyboard-to-speaker latency to output streams. (It also relies on blocking snd_pcm_writei for duplex streams, but this is essentially harmless since RtAudio first blocks on snd_pcm_readi, and by the time the function returns, if the input and output streams are synchronized snd_pcm_writei is effectively a nonblocking write call.)

RtAudio gets duplex wrong, can have xruns and glitches

Issue: RtAudio opens and polls an ALSA duplex stream (in this case, duplex.cpp with extra debug prints added, opening my motherboard's hw device) by:

Don't fill the output with silence.
Call snd_pcm_sw_params_set_start_threshold() on both streams (though RtAudio only triggers on the input, which starts both streams).
snd_pcm_link() the input and output streams so they both start at the same time. Setup the streams the same way regardless if it succeeds or fails. (On my motherboard audio, it succeeds.)

Then loop:

Call snd_pcm_readi(1 period) of input (blocking until available), and pass it to the user callback which generates 1 period of output.
- Because RtAudio calls snd_pcm_sw_params_set_start_threshold on the input stream, and the two streams are linked, snd_pcm_readi() starts both the input and output streams immediately (upon call, not upon return). The output stream is started with no data inside, and tries to play the absence of data. It's a miracle it doesn't xrun immediately.
- Once the input stream has 1 period of input, snd_pcm_readi returns. By this point, the output stream has more snd_pcm_avail() than the total buffer size, and negative snd_pcm_delay(), yet somehow it does not xrun on the first snd_pcm_writei().
Call snd_pcm_writei(1 period) of output. This does not block since there are three periods available/writable (or two if the input/output streams are not linked).
- This is supposed to be called when there is 1 period of empty/available space in the buffer to write to. Instead it's called when there is 1 period of empty space more than the entire buffer size! I don't understand how ALSA even allows this.

(For an overview of the correct way to handle this, see https://gist.github.com/nyanpasu64/bfcaf6b28fefdf791e6213b737d49616#implementing-exclusive-mode-duplex-like-jack2.)

Fixing RtAudio output and duplex

To resolve this for duplex streams, the easiest approach is to change stream starting:

Write 1 full buffer (or the used portion) of silence into the output.
Don't call snd_pcm_sw_params_set_start_threshold() on the output stream of a duplex pair. Instead use snd_pcm_link() to start the output stream upon the first input read (or if snd_pcm_link() fails, start the output stream yourself before the first input read).

This approach fails for output-only streams. To resolve the issue in both duplex and output streams, you must:

Call snd_pcm_sw_params_set_avail_min(unused_buffer_size + 1 period) before starting the output stream.
Call snd_pcm_wait() (or poll()) on the output stream every period, before generating audio.

I haven't looked into how RtAudio stops ALSA streams (with or without snd_pcm_link()), then starts them again, and what happens if you call them quickly enough that the buffers haven't fully drained yet.

garyscavone commented 2 years ago

These are great observations and suggestions. If you could propose PRs to implement the improvements, I'd be happy to consider them.

nyanpasu64 commented 2 years ago

How can RtApiAlsa :: callbackEvent() tell if a particular callback needs to start the streams, or if the streams are already running? Is it okay to call snd_pcm_state() in every callback iteration?

garyscavone commented 2 years ago

callbackEvent() is repetitively invoked by callbackHandler(), which is spawned in a separate thread. At the start of the callbackHandler() function, it checks to see if the stream has been started or not. If it has not been started, then it waits via a pthread_cond_wait() call until signaled by startStream(). The callback does not start the stream. Rather, the user starts the stream via the startStream() function, which then allows callbackEvent() to start processing buffers.

As for calling snd_pcm_state() in every callback iteration, it hasn't seemed to be a problem and I don't see an alternative way to determine whether an over/under-run has occurred.

arximboldi commented 1 year ago

@nyanpasu64 have you made some progress fixing this or have a branch somewhere with the fixes? I am experiencing lots of dropouts on duplex streams and I think this is probably the issue. Thanks for the detailed deconstruction of the bug! I've considered also using other libraries instead... portaudio comes to mind. Tried libsoundio but it doesn't support duplex streams...

nyanpasu64 commented 1 year ago

I'm not sure I ever figured out a fix. I didn't understand RtAudio's threading and condition variable system well, and I think it has some edge-case data races not prevented by locking.

I did find a patch on my disk, but have no clue if it's right or wrong (suspect it's only built to avoid doubled latency with pipewire-alsa, and will fail on real ALSA devices):

commit 32918289cb632a57e61deb5a13cc97fdd92ee9f8
Author: nyanpasu64 <nyanpasu64@tuta.io>
Date:   Wed Jun 8 15:22:46 2022 -0700

    Hack RtApiAlsa into pipewire-alsa zero-latency playback (fails)

diff --git a/RtAudio.cpp b/RtAudio.cpp
index 565dad4..e2cca62 100644
--- a/RtAudio.cpp
+++ b/RtAudio.cpp
@@ -8500,6 +8500,17 @@ void RtApiAlsa :: callbackEvent()
     RtAudioFormat format;
     handle = (snd_pcm_t **) apiInfo->handles;

+    static bool hackety = false;
+    if (!hackety) {
+      if ( stream_.mode == INPUT || stream_.mode == DUPLEX ) {
+        snd_pcm_start( handle[1] );
+      }
+      if ( stream_.mode == OUTPUT || stream_.mode == DUPLEX ) {
+        snd_pcm_start( handle[0] );
+      }
+      hackety = true;
+    }
+
     {
       snd_pcm_uframes_t buffer_size, period_size;
       snd_pcm_get_params(handle[1], &buffer_size, &period_size);

arximboldi commented 1 year ago

In the end I've moved to Portaudio on Linux :)

thestk / rtaudio