no-exceptions: Synchronously returning error messages? Distinguishing between error callbacks on main and audio thread?

thestk / rtaudio

A set of C++ classes that provide a common API for realtime audio input/output across Linux (native ALSA, JACK, PulseAudio and OSS), Macintosh OS X (CoreAudio and JACK), and Windows (DirectSound, ASIO, and WASAPI) operating systems.

Other

1.49k stars 318 forks source link

no-exceptions: Synchronously returning error messages? Distinguishing between error callbacks on main and audio thread? #337

Open nyanpasu64 opened 2 years ago

nyanpasu64 commented 2 years ago

I looked into updating an existing RtAudio app to no-exceptions master, and ran into some trouble (https://github.com/BambooTracker/BambooTracker/pull/426#issuecomment-991486279). This is preliminary and I didn't try actually fixing the code, mostly skimming the RtAudio code and diffs to plan out what I'd have to change. Sorry if I'm misunderstanding something that's already documented.

Currently, when audio->openStream() fails, BambooTracker shows a dialog with contents taken from RtAudioError::getMessage(). After the merge, RtAudioError was removed, audio->openStream() only returns an enum code (no error message string), and RtApi :: error() only reports the message by calling the std::function<...> RtApi::errorCallback_ callback. To my knowledge, you can't access the protected RtApi::errorText_ field without patching RtAudio to expose it through a getter or similar.

If you upgrade to no-exceptions and want to keep printing error messages, (I think) you would have to set a callback. The issue is that RtApi :: error() (which calls your callback) gets called on both on the UI and audio threads, and your callback needs to act differently based on which thread it's running on.

One idea is to edit AudioStreamRtAudio::initialize(), replace the start of the try block with audio->setErrorCallback(...);, replace the end of the block with audio->setErrorCallback({}); to reset it, and pray that the audio thread doesn't start and call RtApi :: error() before we call setErrorCallback({}). I'm not sure that's the case.

If I understand correctly, RtApi :: openStream() calls (eg.) RtApiAlsa :: probeDeviceOpen() which calls pthread_create (which as far as I can tell, begins execution immediately, rather than suspending the thread). In the case of ALSA, alsaCallbackHandler calls RtApiAlsa :: callbackEvent() which (in case of STREAM_STOPPED) waits for AlsaHandle::runnable to become true (it's initially false). I think RtAudio defaults to STREAM_STOPPED. As a result, RtApiAlsa can't continue executing and call RtApi :: error() before the stream is started.

I'm not sure if this will remain the case, and I didn't study every other backend to ensure they share this property (that if you call RtApi::setErrorCallback({}) when RtAudio :: openStream() returns, then RtApi :: error() cannot possibly be called on the newly spawned audio thread). I think RtApiPulse is the same way.

If (in theory) the audio thread starts up, but immediately fails and calls RtApi :: error(), there's an endless zoo of race conditions between the GUI thread calling RtApi::setErrorCallback({}) and resetting errorCallback_, the audio thread reading errorCallback_ to check whether to call it (this is a data race), and the audio thread calling errorCallback_ which tries to show a message box on the wrong thread (cue firework explosions).

One possible way out is to set a callback which checks the current thread ID and only opens a message box if running on the GUI thread.

garyscavone commented 2 years ago

Let me first say that the no-exceptions approach is new and I would not be surprised if it needs to be tweaked a bit. So, I'm happy to take suggestions or PRs that fix identified problems. I would also mention that I'm working on a big update regarding device enumeration (in the newdeviceselection branch), so that may change some behaviour too. But regarding what is described in this post, I don't immediately see what the problem is. It is true that the callback thread for the ALSA API is started near the end of the RtApiAlsa::probeDeviceOpen() and then conditionally waits until the stream is started, but that does not block other RtAudio functions from being called. If an error occurs when startStream() is called, the error() function (and if set, the error callback) will be invoked.

nyanpasu64 commented 2 years ago

The issue I have is that there's no longer a clean way to call openStream(), and receive error strings for only errors originating from the main thread before openStream() returns. Calling setErrorCallback() lets you supply a function which handles errors... but gets called both from the main thread (for openStream() setup errors) and the audio thread (for audio loop errors).

~~Or is it important for graphical applications to handle both kinds of error, so you don't want to provide a convenient way to only handle setup errors?~~

EDIT: Possibly related to picking a device: https://github.com/thestk/rtaudio/issues/194

garyscavone commented 2 years ago

Perhaps I'm missing something but if openStream fails, it will return an RtAudioErrorType, so it would be clear there was a problem in that specific function (and if openStream fails, no audio thread would be created). I agree it might be an issue to get the corresponding error text for a specific problem, though it should be the one passed to the error callback with that particular error type. If this is the problematic issue for you, maybe an extra function such as getLastErrorMessage() could be added?

nyanpasu64 commented 2 years ago

There are two high-level approaches to error handling: handling UI thread errors but ignoring audio thread errors (ideally they'd just stop the stream rather than terminating the app), and handling both UI thread errors and audio thread errors in the same or different ways. Currently this can be done by adding an error callback in both cases, and checking if the current thread is different from the thread calling openStream(), and either ignoring audio-thread errors or sending them to the UI thread over a queue or such.

There are two primary cases I feel aren't well-served right now: handling UI thread errors without a callback but ignoring audio thread errors, and handling UI thread errors without a callback but handling audio thread errors with a callback (you don't have much choice). If you add getLastErrorMessage(), then the first case can be handled by not adding a callback. But the second case requires adding a callback for audio thread errors, which also gets invoked for openStream() UI thread errors, so I have to either make the callback silently drop them (since they'll be handled when openStream() returns) or move my error handling logic into the callback (which works but I was trying to avoid it in the first place).

One property I want the API to have, is that when I add or remove audio thread error handling from an app, I don't have to change or move the UI thread error handling code. One way to do this is to separate audio thread error callbacks from UI thread error handling (and this is the API I want at the moment). For UI thread error handling (and synchronous code in general), I still prefer return codes and error strings over callbacks, since callbacks are indirect control flow, which is complicated to write and understand.

Avoiding data races by making shared state explicit

Also, can an unlucky user cause RtAudio to write to errorText_ and call error() on both the audio and main thread at the same time, causing a data race? If so, that's a good reason to entirely separate the audio thread error callback system from the GUI thread error status (or callback) system, and make them build up errors in different strings and call different functions to read from the string.

(Why is errorText_ a field, rather than a stack-constructed std::string passed by (const?) reference into error()? Is the intent to reuse allocations across errors (though errors should happen so infrequently that the speed doesn't matter)? You could remove errorText_ altogether, and use stack-allocated string literals or std::string instead. This resolves the data race on errorText_, but not on firstErrorOccurred_ and the other fields accessed by RtApi :: error().)

If you split the audio and UI thread error functions, the tricky part is making sure you only call the audio thread error function on the audio thread, and the UI thread error function on the UI thread.

Personally I like to ensure this by creating a separate object for audio thread state (eg. the ALSA/Pulse handle, the current audio error string, plus a pointer to shared state), object for UI thread state (not sure what, plus a pointer to shared state), and struct for shared state (eg. an atomic variable holding playback state, variables to signal the audio thread to pause/resume/stop playback, probably put mutexes and condvars here too). The audio thread object is constructed on the UI thread, but sent to the audio thread and no longer referenced by the UI thread, and the two objects only communicate through the shared state struct. Then you put methods on the audio and UI thread objects, so each object's methods only has access to the state belonging to that thread plus shared state, and accesses to shared state are syntactically distinguished since it's behind a pointer.

This is the approach Rust encourages for threading, and it makes it clear what methods run on each thread and what state each thread accesses, by baking them into the type system and explicitly in the source code (instead of relying on the programmers understanding the code and not making mistakes, which is rarely the case). However this isn't standard C++ practice (I don't care, I prefer correct-by-design to standard practice) and C++ isn't the best at mutexes (see my blog post), and it's a lot of work to retrofit existing C++ code into this style (as a consolation, it's good at exposing data races to be fixed), and requires intimate code knowledge I don't have yet (since I didn't design RtAudio).

nyanpasu64 commented 2 years ago

creating a separate object for audio thread state (eg. the ALSA/Pulse handle, the current audio error string, plus a pointer to shared state), object for UI thread state (not sure what, plus a pointer to shared state), and struct for shared state (eg. an atomic variable holding playback state, variables to signal the audio thread to pause/resume/stop playback, probably put mutexes and condvars here too)

Will you accept a pull request to split RtApi into separate types accessed by the UI and audio threads? For example, the audio thread operates on an expanded CallbackInfo subclass, whose object only contains atomic and mutex-guarded data rather than an entire RtApi subclass? And the audio thread methods (like RtApiAlsa :: callbackEvent()) will be moved to either static functions, or CallbackInfo{Alsa,etc.} methods?

I looked into porting, and unfortunately it seems to require a good deal of effort, and may conflict with the in-progress changes at https://github.com/thestk/rtaudio/tree/newdeviceselection.

Also I only have a Windows and Linux machine to build and test on, so I can't port Core Audio at the moment, and I don't use OSS on Linux, nor ASIO on Windows (unless I perhaps install ASIO4ALL?). I don't know if this approach allows me to incrementally port code to the new "thread-safe by design through static typing" architecture.

nyanpasu64 commented 2 years ago

I spent a day or two sorting through RtAudio's ALSA backend. I see a bunch of data races (errorCallback_, errorText_, RtApiStream::state). Additionally I think probeDeviceOpen() shouldn't spawn a thread, but instead we spawn a thread after opening a device (by calling a separate virtual method in RtApi). This makes it easier to prove that probeDeviceOpen() doesn't data-race writing to RtApiStream.

Unfortunately RtAudio's threading is difficult to understand, with RtApiStream::state being accessed both with and without a mutex held, etc. The design is quite messy right now (RtApiStream, CallbackInfo, and AlsaHandle fields being read and written by the main and audio threads, with and without mutexes held), to the point I don't know how to clean it up.

In comparison, cpal (a Rust library)'s multithreading (link) is beautiful. Every type is either owned by the main thread (Stream), owned by the audio thread (the call stack of output_stream_worker), or explicitly shared between the threads (StreamInner). However to my knowledge cpal doesn't currently support duplex streams with synchronous input and output buffers. (I haven't done written an app that records audio, but this would be a large issue if I did.) I would use cpal as a reference if I wanted to redesign RtAudio's multithreading.

Am I actually qualified to rewrite RtAudio multithreading? I'd have to change data structures, and I can't port or test Core Audio or OSS (and perhaps ASIO too).

I also also want to add a mode where RtAudio tracks the default device as it changes. This is unrelated to threading, and I could work on that while leaving the data races in place. But I feel uncomfortable trying to add extra features on a broken library, so I don't know what to do...

garyscavone commented 2 years ago

I'm happy to have contributions to RtAudio. However, I would point out that RtAudio has been working fairly well for two decades. As messy as you may think it is, I don't want to break it. AND importantly, I want to keep things simple. So, you can submit PRs but I'm not interested in complete rewrites unless they make sense and hopefully reduce the number of lines of code.

garyscavone commented 2 years ago

Also, I'm not interested in having RtApi split. And I would look to the new branch (newdeviceselection) as a basis, as I will merge that as soon as I finish getting it to work for the Windows APIs. I would note that in implementing this new branch, I have generally been able to reduce the lines of code fairly significantly.

nyanpasu64 commented 2 years ago

If you're not interested in splitting RtApi (I'm not exactly sure what you mean), I might fork RtAudio (and largely preserve API but not architecture or ABI), and test my changes using BambooTracker. Though I'll wait until newdeviceselection is finished or merged, before working more on this.

Using cpal as a reference, I estimate my proposed changes should maintain or decrease the lines of code.

garyscavone commented 2 years ago

I've finished the newdeviceselection port but it entails more API changes, so I am hoping to get feedback from others before merging it. Generally, I think it is a big improvement in terms of how devices are queried and enumerated.

nyanpasu64 commented 2 years ago

I haven't done any serious work with device selection in exotracker. I might look into porting https://github.com/BambooTracker/BambooTracker to updated RtAudio, but I'm occupied elsewhere now.

If I have the time (unlikely), I still hope to fork RtAudio and rewrite a good chunk of the code from scratch. I've learned a lot about ALSA buffering (there are substantial changes I'd make to RtApiAlsa's backend code), but have yet to study PulseAudio or WASAPI and friends.

nyanpasu64 commented 2 years ago

Sorry for not responding earlier. After months of rewrites, I finally have time to try upgrading RtAudio and seeing how it behaves. How does newdeviceselection compare to master right now (master was updated more recently)?

garyscavone commented 2 years ago

There may have been a few recent updates to master recently but newdeviceselection is the future. I just need to find time to merge it into master.

nyanpasu64 commented 2 years ago

Playing with the newdeviceselection branch. Here's my preliminary feedback:

ALSA is again the default over JACK. This fixes my system on PipeWire; previously JACK was the default, and when RtAudio ran on pipewire-jack, it would try to connect to Firefox instead of speakers, and fail to open an audio stream. I didn't test if Jack is still broken.

getDeviceCount() is a footgun:

RtAudio::getDeviceInfo() takes a device ID from RtAudio::getDeviceIds() instead of an integer between 0 and < RtAudio::getDeviceCount(). As a result, my old code (which passes ints between 0 and < getDeviceCount()) compiles but fails at runtime. Is it worth renaming RtAudio::getDeviceInfo, removing RtAudio::getDeviceCount(), or changing the type signature to accept a struct instead of an unsigned int, so previous code will raise a compiler error?
- Unless I'm mistaken, there's practically no remaining use case for getting a device count alone (which is only useful for integer loops(?)), since no(?) RtAudio methods accept internal indexes into deviceList_ but instead device IDs. So making counting devices harder isn't a major problem.
- If you do for some reason want the device count alone, both getDeviceIds() and getDeviceNames() allocate a new vector holding a subset of the data in std::vector<RtAudio::DeviceInfo> RtApi::deviceList_. So removing getDeviceCount() increases the allocation overhead of that use case.

Redundant device probing:

Apparently RtApi::getDeviceCount() and RtApi::getDeviceIds() always call RtApiAlsa::probeDevices(), but RtApi::getDeviceInfo() only probes devices if deviceList_ is empty. I would've objected to getDeviceCount() and getDeviceIds() having different behavior.
- Calling getDeviceCount() followed by getDeviceNames() will probe devices twice, and may find a different set of results each time. Is this a footgun? Is there a better way for users to explicitly signal when to rescan or not?
  - EDIT: You don't actually have to call getDeviceNames, since (as indicated in probe.html) you can call getDeviceInfo on each device ID... but this introduces O(n) linear scans with O(n^2) runtime... at this point you pray nobody has a freak computer with hundreds of devices.
- I still don't know how RtApi::deviceList_ and deviceIds_ (apparently its type varies by RtApi subclass, do the two arrays always remain in sync?), and RtApiAlsa::probeDevices()#deviceIds and deviceNames interact. I'll have to read the source more carefully and take notes (EDIT: I found the documentation at probe.html, but it covers API rather than implementation)
- RtApiAlsa::probeDevices() is called once internally by RtAudio::RtAudio() -> RtApi::getDeviceNames(), and again when my application code calls RtAudio::getDeviceIds() -> RtApi::getDeviceIds(). This probes devices twice, which doesn't need to happen.
- Theoretically even if you only enumerate devices once, then open a device by ID immediately after, there's still a TOCTTOU/race condition between getting a device list and opening one device (if a hw is removed and a different hw is added in its place). CPAL holds each device open (until you drop the handle) while you iterate through them one by one, but that causes other problems (https://github.com/RustAudio/cpal/issues/634) so I can't recommend that approach (perhaps you could pass one device at a time into a callback, but people might push them into a vector anyway).
I noticed devices appear and disappear in RtApiAlsa::probeDevices()'s output when I closed and restarted my program, but I suspect that isn't a RtAudio or ALSA race condition, but merely PipeWire/WirePlumber occupying hw devices when audio is playing and freeing them up during moments of silence (as "power saving").

nyanpasu64 commented 2 years ago

Observations from reading the ID system more carefully (only RtApiAlsa so far):

RtAudio has two different types of IDs: public unsigned int RtApi IDs, and private backend-specific (eg. strings for RtApiAlsa) IDs. I'd prefer giving both variables distinct names, and/or defining an AlsaDeviceId or RtApiAlsa::DeviceId struct or typedef around std::string.
- RtApi::currentDeviceId_ is used to allocate public integer IDs. In theory this can overflow after plugging and unplugging 4 billion devices.
std::vector<RtAudio::DeviceInfo> RtAudio::deviceList_ holds all properties except backend-specific IDs. std::vector< std::string > RtApiAlsa::deviceIds_ holds ALSA-specific IDs. An invariant is that both vectors must remain in sync at all times, adding and removing matching items to the same indexes. I would document this in the source code.
Within RtApiAlsa :: probeDevices(), std::vector<std::string> deviceIds; holds ALSA-specific IDs, and std::vector<std::string> deviceNames; holds user-facing names exposed to backend-independent std::vector<DeviceInfo> RtAudio::deviceList_. Both vectors are parallel arrays, pushed to at the same time. (Unlike RtApi[Alsa] vectors, these local vectors could be merged into a vector of structs.)
You explicitly check for default and pulse (and not pipewire), and otherwise only enumerate hw devices. Why not imitate aplay -L and give the user all the ALSA devices (including unusable ones), or maybe filter out the unusable ones?
When devices are repeatedly probed/enumerated, existing devices keep their public integer ID, new devices receive new IDs from RtApi::currentDeviceId_++ and are pushed to RtAudio::deviceList_ and RtApiAlsa::deviceIds_ simultaneously, and removed devices are removed from RtAudio::deviceList_ and RtApiAlsa::deviceIds_ simultaneously.
- Device equality comparison is performed using RtAudio::deviceList_[].name and deviceNames[] (the user-facing name, for ALSA derived from sprintf( name, "%s (%s)", snd_ctl_card_info_get_name(ctlinfo), snd_pcm_info_get_id(pcminfo) )). I sure hope no 2 devices have the same name (in my testing, my different HW devices all had different names). I assume ALSA lets you open the same device (like default) in both input and output mode, and they're the same device in both cases.
I found at least four distinct sites in RtAudio and application usage, which would cause $O(n^2)$ behavior in the number of devices present. Should this be removed, and does it cause noticeable slowdowns in practice?
- // Fill or update the deviceList_ and also save a corresponding list of Ids. iterates over deviceNames then RtApi::deviceList_.
- // Remove any devices left in the list that are no longer available. iterates over deviceList_ then deviceNames.
- It also calls RtApi::deviceList_.erase() and RtApiAlsa::deviceIds_.erase() up to deviceList_.size() times.
- When a user calls std::vector<unsigned int> RtAudio::getDeviceIds() followed by RtAudio::getDeviceInfo(unsigned int deviceId) on each ID returned, each of the N calls to RtApi::getDeviceInfo(unsigned int deviceId) performs a linear scan over RtAudio::deviceList_.
(from my previous comment) RtApiAlsa::probeDevices() is called once internally by RtAudio::RtAudio() -> RtApi::getDeviceNames(), and again when my application code calls RtAudio::getDeviceIds() -> RtApi::getDeviceIds(). This probes devices twice, which doesn't need to happen. Is there a better way for users to explicitly signal when to rescan or not?

garyscavone commented 2 years ago

Thanks for these observations and suggestions. I'm not sure when I'll have time to carefully consider them but I'm very happy to have the feedback. Currently, my approach was not to expect the user to specify a rescan but rather to do it automatically whenever a call is made that might need to involve a re-probing of devices (given hot-pluggable devices). If the APIs made it easier to know when new devices were added, things could be simplified but that is not currently the case.

nyanpasu64 commented 2 years ago

The suggestion I'm most strongly for, is to remove getDeviceCount(). That API is useless if you call it alongside getDeviceIds() since the two function calls correspond to different device scans and may hold different data. And worse yet, it allows old previously-correct code to keep compiling and start doing the wrong thing (failing at runtime).

My other suggestions are less important. I have not observed two different devices with the same name so far (since RtApiAlsa looping over cards and device matches the final discovered list). And from what I hear, $O(n^2)$ in the number of devices isn't a major issue since most users will only have up to a few dozen devices present at a time.