thestk / rtaudio

A set of C++ classes that provide a common API for realtime audio input/output across Linux (native ALSA, JACK, PulseAudio and OSS), Macintosh OS X (CoreAudio and JACK), and Windows (DirectSound, ASIO, and WASAPI) operating systems.
Other
1.49k stars 318 forks source link

ALSA backend buffers too much data for output streams, too little for duplex streams #356

Open nyanpasu64 opened 2 years ago

nyanpasu64 commented 2 years ago

Full writeup at https://gist.github.com/nyanpasu64/bfcaf6b28fefdf791e6213b737d49616.

My assumption is that RtAudio is designed to provide low-latency (no excess buffering) and glitch-free audio input and output. Here are some problems in RtApiAlsa's operation that prevents the goal from being achieved:

Minimum achievable input/output/duplex latency

The minimum achievable audio latency at a given period size is achieved by having 2 periods of total capture/playback buffering between hardware and a app (RtApiAlsa, JACK2, or PipeWire).

For duplex streams, the total round-trip (microphone-to-speaker) latency of a duplex stream is N periods.

For capture and duplex streams, there are 0 to 1 periods of capture (microphone-to-screen) latency (since microphone input can occur at any time, but is always processed at period boundaries).

For playback and duplex streams, there are N-1 to N periods of playback (keyboard-to-speaker) latency (since keyboard input can occur at any point, but is always converted into audio at period boundaries).

These values only include delay caused by audio buffers, and exclude extra latency in the input stack, display stack, sound drivers, resamplers, or ADC/DAC.

Avoid blocking writes (output only) (RtAudio has added latency)

If your app generates one output period of audio at a time and you want to minimize keypress-to-audio latency, regardless if your app outputs to hardware devices or pull-mode daemons, it should never rely on blocking writes to act as output backpressure. Instead it should wait until 1 period of audio is writable, then generate 1 period of audio and nonblocking-write it. (This does not apply to duplex apps, since waiting for available input data effectively acts as output throttling.)

If your app generates audio before performing blocking writes for throttling, you will generate a new period of audio as soon as the previous period of audio is written (a full period of real time before a new period of audio is writable). This audio gets buffered for an extra period (while snd_pcm_writei() blocks) before reaching the speakers, so external (eg. keyboard) input takes a period longer to be audible.

(Note that avoiding blocking writes isn't necessarily beneficial if you don't generate audio in chunks synchronized with output periods.)

Issue: RtAudio relies on blocking snd_pcm_writei in pure-output streams. This adds 1 period of keyboard-to-speaker latency to output streams. (It also relies on blocking snd_pcm_writei for duplex streams, but this is essentially harmless since RtAudio first blocks on snd_pcm_readi, and by the time the function returns, if the input and output streams are synchronized snd_pcm_writei is effectively a nonblocking write call.)

RtAudio gets duplex wrong, can have xruns and glitches

Issue: RtAudio opens and polls an ALSA duplex stream (in this case, duplex.cpp with extra debug prints added, opening my motherboard's hw device) by:

Then loop:

(For an overview of the correct way to handle this, see https://gist.github.com/nyanpasu64/bfcaf6b28fefdf791e6213b737d49616#implementing-exclusive-mode-duplex-like-jack2.)

Fixing RtAudio output and duplex

To resolve this for duplex streams, the easiest approach is to change stream starting:

This approach fails for output-only streams. To resolve the issue in both duplex and output streams, you must:

I haven't looked into how RtAudio stops ALSA streams (with or without snd_pcm_link()), then starts them again, and what happens if you call them quickly enough that the buffers haven't fully drained yet.

garyscavone commented 2 years ago

These are great observations and suggestions. If you could propose PRs to implement the improvements, I'd be happy to consider them.

nyanpasu64 commented 2 years ago

How can RtApiAlsa :: callbackEvent() tell if a particular callback needs to start the streams, or if the streams are already running? Is it okay to call snd_pcm_state() in every callback iteration?

garyscavone commented 2 years ago

callbackEvent() is repetitively invoked by callbackHandler(), which is spawned in a separate thread. At the start of the callbackHandler() function, it checks to see if the stream has been started or not. If it has not been started, then it waits via a pthread_cond_wait() call until signaled by startStream(). The callback does not start the stream. Rather, the user starts the stream via the startStream() function, which then allows callbackEvent() to start processing buffers.

As for calling snd_pcm_state() in every callback iteration, it hasn't seemed to be a problem and I don't see an alternative way to determine whether an over/under-run has occurred.

arximboldi commented 1 year ago

@nyanpasu64 have you made some progress fixing this or have a branch somewhere with the fixes? I am experiencing lots of dropouts on duplex streams and I think this is probably the issue. Thanks for the detailed deconstruction of the bug! I've considered also using other libraries instead... portaudio comes to mind. Tried libsoundio but it doesn't support duplex streams...

nyanpasu64 commented 1 year ago

I'm not sure I ever figured out a fix. I didn't understand RtAudio's threading and condition variable system well, and I think it has some edge-case data races not prevented by locking.

I did find a patch on my disk, but have no clue if it's right or wrong (suspect it's only built to avoid doubled latency with pipewire-alsa, and will fail on real ALSA devices):

commit 32918289cb632a57e61deb5a13cc97fdd92ee9f8
Author: nyanpasu64 <nyanpasu64@tuta.io>
Date:   Wed Jun 8 15:22:46 2022 -0700

    Hack RtApiAlsa into pipewire-alsa zero-latency playback (fails)

diff --git a/RtAudio.cpp b/RtAudio.cpp
index 565dad4..e2cca62 100644
--- a/RtAudio.cpp
+++ b/RtAudio.cpp
@@ -8500,6 +8500,17 @@ void RtApiAlsa :: callbackEvent()
     RtAudioFormat format;
     handle = (snd_pcm_t **) apiInfo->handles;

+    static bool hackety = false;
+    if (!hackety) {
+      if ( stream_.mode == INPUT || stream_.mode == DUPLEX ) {
+        snd_pcm_start( handle[1] );
+      }
+      if ( stream_.mode == OUTPUT || stream_.mode == DUPLEX ) {
+        snd_pcm_start( handle[0] );
+      }
+      hackety = true;
+    }
+
     {
       snd_pcm_uframes_t buffer_size, period_size;
       snd_pcm_get_params(handle[1], &buffer_size, &period_size);
arximboldi commented 1 year ago

In the end I've moved to Portaudio on Linux :)