thestk / rtaudio

A set of C++ classes that provide a common API for realtime audio input/output across Linux (native ALSA, JACK, PulseAudio and OSS), Macintosh OS X (CoreAudio and JACK), and Windows (DirectSound, ASIO, and WASAPI) operating systems.
Other
1.49k stars 318 forks source link

RtAudio PulseAudio apps randomly fail to scan input/output devices (consistently on pipewire-pulse) #304

Closed nyanpasu64 closed 3 years ago

nyanpasu64 commented 3 years ago

Version, Distribution, Desktop Environment:

Description of Problem:

RtAudio apps running under pipewire-pulse see zero output devices and fail to start. The same error occurs randomly on PulseAudio, more commonly when the pulseaudio daemon doesn't have enough CPU time to run quickly.

Steps to Reproduce:

EDIT:

Original instructions: 1. Install pipewire\[-git\] and pipewire-pulse\[-git\] on a system, and uninstall pulseaudio if present. Start the pipewire daemon. 2. Clone BambooTracker git master from https://github.com/BambooTracker/BambooTracker, and build and run it. (On Arch, you can install aur/bambootracker-git.) - There are prebuilt binaries at https://github.com/BambooTracker/BambooTracker/releases/download/v0.4.6/BambooTracker-v0.4.6-linux.zip, but it's 124 megabytes and is some strange nix-os shell script thing. 3. In File -\> Configuration -\> Sound, set the API to PulseAudio and click Apply.

Actual Results:

On pipewire-pulse, you'll see No audio devices found! since RtApiPulse::getDeviceCount( void ) returns 0.

The same bug can appear with regular PulseAudio, but with a much lower probability. On PulseAudio, you'll instead see an endless stream of RtAudio pulse: running realtime scheduling, eventually terminated by "RtApi::openStream: output device parameter value is invalid." I only observed this error on slow CPUs, or when I'm stress-testing my fast CPU cores so the PulseAudio server is starved of CPU time.

This error arises because RtAudio tries to open device 0 from a list of 0 devices. There are 0 devices because RtAudio calls pa_context_get_sink_info_list to fetch a list of output devices, but the callback gets called 0 times. (pa_context_get_source_info_list's callback also gets called 0 times.)

Stack trace of rt_pa_server_callback (which calls pa_mainloop_quit) on PulseAudio:

#0  rt_pa_server_callback (context=0x555555c7f160, info=0x7fffffffd4a0, data=0x0) at /home/nyanpasu64/code/exotracker-cpp/3rdparty/rtaudio/RtAudio.cpp:8509
#1  0x00007ffff773d119 in context_get_server_info_callback (pd=pd@entry=0x555555bd8820, command=command@entry=2, tag=tag@entry=2, t=t@entry=0x555555c73650, userdata=userdata@entry=0x555555b93300) at ../pulseaudio/src/pulse/introspect.c:122
#2  0x00007ffff6392144 in run_action (pd=0x555555bd8820, r=0x555555bb0570, command=2, ts=0x555555c73650) at ../pulseaudio/src/pulsecore/pdispatch.c:291
#3  0x00007ffff639339f in pa_pdispatch_run (pd=0x555555bd8820, packet=packet@entry=0x555555bc2880, ancil_data=ancil_data@entry=0x555555b93728, userdata=userdata@entry=0x555555c7f160) at ../pulseaudio/src/pulsecore/pdispatch.c:344
#4  0x00007ffff773d435 in pstream_packet_callback (p=<optimized out>, packet=0x555555bc2880, ancil_data=0x555555b93728, userdata=0x555555c7f160) at ../pulseaudio/src/pulse/context.c:364
#5  0x00007ffff6398419 in do_read (p=p@entry=0x555555b93490, re=re@entry=0x555555b93658) at ../pulseaudio/src/pulsecore/pstream.c:1023
#6  0x00007ffff63991a8 in do_pstream_read_write (p=0x555555b93490) at ../pulseaudio/src/pulsecore/pstream.c:254
#7  0x00007ffff639959b in srb_callback (srb=0x555555bc24d0, userdata=0x555555b93490) at ../pulseaudio/src/pulsecore/pstream.c:296
#8  0x00007ffff639cbfa in srbchannel_rwloop (sr=0x555555bc24d0) at ../pulseaudio/src/pulsecore/srbchannel.c:190
#9  0x00007ffff77514a3 in dispatch_pollfds (m=0x555555b7ef70) at ../pulseaudio/src/pulse/mainloop.c:676
#10 pa_mainloop_dispatch (m=m@entry=0x555555b7ef70) at ../pulseaudio/src/pulse/mainloop.c:917
#11 0x00007ffff7751b0d in pa_mainloop_iterate (m=m@entry=0x555555b7ef70, block=block@entry=1, retval=retval@entry=0x7fffffffd814) at ../pulseaudio/src/pulse/mainloop.c:948
#12 0x00007ffff7751bb1 in pa_mainloop_run (m=0x555555b7ef70, retval=0x7fffffffd814) at ../pulseaudio/src/pulse/mainloop.c:963

When running under real PulseAudio, do_pstream_read_write loops (because p->srb is non-null) and calls do_read (which indirectly calls rt_pa_sink_info_cb) before returning, so dispatch_pollfds has no chance to notice that m->quit is set.

Running on pipewire-pulse, the srb_callback and srbchannel_rwloop stack frames are missing (I think because pipewire-pulse doesn't support srb yet), and do_pstream_read_write only calls do_read once (because p->srb is null). Once do_pstream_read_write returns, dispatch_pollfds notices m->quit is set and returns. What is srb?

/* An shm ringbuffer that is used for low overhead server-client communication.
 * Signaling is done through eventfd semaphores (pa_fdsem). */

typedef struct pa_srbchannel pa_srbchannel;

Is it a libpulse issue that pa_srbchannel connections respond to packets before quitting (allowing code written like RtAudio to work most of the time on srb-enabled PulseAudio connections but always fail on non-srb-enabled connections)? Do they need to be made consistent (most likely by making srb-enabled do_pstream_read_write check for m->quit on every message and consistently terminate immediately, consistently breaking RtAudio apps)?

I think RtAudio is wrong for calling pa_mainloop_api::quit. I believe this introduces a race condition where if the PulseAudio server is unable to send the packets before do_pstream_read_write calls do_read and checks for packets, then do_pstream_read_write returns and dispatch_pollfds exits the loop because m->quit is set. As a result, rt_pa_sink_info_cb never runs and RtAudio fails to scan for devices, instead failing on "RtApi::openStream: output device parameter value is invalid."

I didn't prove causality (eg. by editing libpulse's source code so do_pstream_read_write prints how many times it loops), but I have indirect evidence:

Sadly I'm not sure how to rewrite rt_pa_context_state_callback to fix this bug, so it terminates the event loop after all the callbacks are done running. I'd have to look into it later.

Symptom: PulseAudio device does not support output

While testing RtAudio apps, I also saw an "PulseAudio device does not support output." error when starting the program. Unlike the main bug report, calling RtApiPulse::getDeviceCount() again and trying to restart the stream only produced more of the same error, unlike the main bug report where each RtApiPulse::getDeviceCount() would randomly misbehave or work. However, recreating the RtAudio object seems to fix it. I'm not sure if it has the same root cause.

EDIT: I set a breakpoint for "PulseAudio device does not support output." and you can't reroll the dice by destroying and recreating RtAudio objects, only by setting oParams.deviceId = dac.getDefaultOutputDevice() anew each iteration (which can trigger the bug whether or not you destroy and recreate the RtAudio object). When setting a breakpoint at this error message being printed, I saw that the device passed into RtApiPulse::probeDeviceOpen (equal to oParams->deviceId) was 0 instead of 1.

I think this is another symptom of the same bug, where RtApi :: getDefaultOutputDevice( void ) calls RtApiPulse::getDeviceCount( void ) and it erroneously returns 0 or so, and so getDefaultOutputDevice returns 0 as well, which is an input-only device on my machine.

nyanpasu64 commented 3 years ago

pa_context_get_server_info calling rt_pa_mainloop_api_quit(0) has been in the codebase for a long time. The call to pa_context_get_sink_info_list was added later on in c0d33839f522710e5a616a0b5e43d42d95096c7f, but the author neglected to move the call to rt_pa_mainloop_api_quit(0) to the last function called (rt_pa_source_info_cb with eol=1). I'm preparing a PR to do that.