mach: Proposal: Implement an Audio Graph

hexops / mach

zig game engine & graphics toolkit

https://machengine.org

Other

2.96k stars 143 forks source link

mach: Proposal: Implement an Audio Graph #546

Open desttinghim opened 1 year ago

desttinghim commented 1 year ago

Audio DSP is very natural to program using a signal-flow graph, which in the context of audio we can just call an audio graph. Many audio APIs use it as their basis - for example, miniaudio, webaudio, pure data, and fmod. An audio graph is a directed graph composed of unit generators (nodes) and signals (edges). It should be reconfigurable at run time to support different audio "scenes".

Implementation

This is a description of a basic implementation of an audio graph. It doesn't handle concurrency or ECS integration, though the basic concept should be possible to implement within the ECS.

There are three types of nodes: Inputs, Outputs, and Units. An input generates output without taking any input from the graph. Outputs take inputs without generating a graph output. Units must have one or more output channels and zero or more input channels.

The graph is stored as a list of these nodes, and a list of the connections between them. Each connection is made of 4 numbers - output node & channel and input node & channel.

To generate audio samples, a depth-first search is performed on the graph. Each node visited will be pushed into a scheduling list. Once the search is completed, the list is reversed and the unit generators are run from the beginning of the list to the end. Input nodes will be the first node to run for each branch of the graph they are on. Outputs will run last. When a unit generator is run, it will read from each input, calculate a new value, and add that value to each output sample. Note that inputs are not necessarily an audio signal - they may also be a control parameter.

Nodes

This is not an exhaustive list of nodes, but should help to clarify how it will work.

Input

Line in/microphone - reads from system microphone
Audio buffer - reads audio data from a buffer
Constant values - fill their outputs with a constant value
Ramp - linearly interpolates between a start and end value

Output

Line out/dac - the number of channels can vary depending on the audio setup
Audio buffer - stores the audio data into a buffer for further processing

Unit

Sine wave - Writes a sine wave to output. 2 inputs: frequency and phase
Square wave - Writes a square wave to output. 3 inputs: frequency, pulse width, and phase
Biquad filter - filters an input signal using a biquad filter.

slimsag commented 1 year ago

This looks like a great starting point, indeed! I think if we implement this in github.com/machlibs/synth that will be a great way to get started on this, let me get a better feel for how this looks/works in practice, and then I can potentially think about how this intersects with ECS (if at all) later on.

desttinghim commented 1 year ago

I've prototyped the DSP graph in machlibs/synth and figured out some issues with the initial proposal. My plan now is to more closely replicate API's like WebAudio and miniaudio - the design decisions behind them make more sense now that I've attempted to implement my own version.

What worked

Scheduling the nodes to be ran worked! The algorithm used was a breadth first search instead of a depth first search, but that was a mis-reading on my part
Storing the connections, though I ended up using pointers + channel numbers for each connection.

What didn't work

Using audio channels for parameter control. This was an attempt to simplify on my end but it makes connecting nodes together much more difficult.
Allocating units for the player. This wasn't discussed in the proposal, but I made Units a fixed size and use a memory pool for them. This is too limiting and I can't see it scaling to larger projects

Changes

Here's what I think should be changed.

AudioNodes and AudioParams

Instead of the Input, Output, and Unit types I discussed before, there will be AudioSources, AudioEffects, and AudioSinks. Additionally, there will be AudioParams that are used to control parameters. To define a new node, developers will embed an AudioSource, AudioEffect, or AudioSink inside of a struct so that @fieldParentPtr can be used to retrieve data from the struct. For real-time parameters, developers should add AudioParams to their node.

Channels

Channels only work with audio signals. 1 channel is monophonic, 2 channels is stereo, and so on. When AudioNodes are ran, they will receive the same number of input channels and output channels. When connecting nodes with mismatched numbers of input/output channels, the audio will need to be mixed or copied. I haven't decided whether explicit splitter/mixer nodes should be used, or if this should be automatically handled. The former would be simpler to implement, so I'll try it out first.

AudioGraph

The AudioGraph will still store connections, schedule the nodes, and store the audio buses. Developers will need to figure out their own strategy for allocating AudioNodes and AudioParams.

meshula commented 1 year ago

I've been working on a WebAudio compatible audio library for over a decade now (https://github.com/LabSound/LabSound) ~ it's widely used as a webaudio backend for node.js, a bunch of AR headsets, and other applications.

One of the things I've managed to do that I'm proud of is to boil the web interface down to a simple C interface, that surprisingly, encompasses the massive footprint of the "official" WebAudio interface ~ you can see that here: https://github.com/LabSound/LabSoundGraphToy/blob/main/src/LabSoundInterface.h ~ basically anything you can do with WebAudio can be achieved through the interface, but without massive OO bloat. (minus 3d audio, I haven't exposed that yet, although LabSound has full HRTF based spatialization).

LabSoundGraphToy used skpyjacks' entt for a long time, although most recently, I've simplified it down to a simple pin map which you can see at the top of https://github.com/LabSound/LabSoundGraphToy/blob/main/src/LabSoundInterface.cpp, because in the end, I didn't get as much mileage out of ECS as I expected in graph management; the amount of code needed to encode a graph in an ECS was kind of huge compared to writing something specially for graphs.

The full list of nodes I've got is here https://github.com/LabSound/LabSound/blob/main/include/LabSound/LabSound.h, if you look at the includes.

LabSound has backends for most platforms, and build variant for miniaudio and RtAudio.

Currently I'm working on a rewrite of the core signal processing graph (https://github.com/LabSound/LabSound/tree/ls2) to greatly improve performance and maintainability.

I'd be pretty motivated to see the core rewrite go even further to pure C, with strong zig bindings, if that sort of collab is of any interest.

desttinghim commented 1 year ago

Thanks @meshula, LabSound seems like a really cool project. I checked out the LabSoundGraphToy and it's very much in line with the type of interface I would like to eventually provide. It's still a work in progress? I had to do some work to get it to compile n my Linux/NixOS machine, and I submitted a few PRs with the changes I made. There were some bugs, but overall I like the idea! Let me know if you'd like me to file bug reports.

The interface you have provided looks very succinct! It's awesome you managed to whittle it down that far. From what I've gathered so far, it seems that the pins are the parameters and noodles is your term for the connections between nodes? The noodle term looks likes its for use in the internal API. Spatialization doesn't seem like it would be too difficult to add to the API.

It's interesting to hear that integration with an ECS took a lot of effort. It's not too surprising after looking at the pin map definition - it's hard to imagine a simpler structure.

I don't have any objections to collaborating on a pure C implementation of LabSound. :slightly_smiling_face: My personal opinion is that Zig would be a better language to implement it in, but obviously a C implementation would be easier to integrate into existing projects. The biggest problem for using C instead of Zig is compiling for WebAssembly. Zig can compile C to wasm, but Zig doesn't provide libc for wasm. So C can be used as long as it doesn't rely on libc; this is not the case for most C libraries, making most C libraries unusable from WebAssembly with Zig.

meshula commented 1 year ago

Yes, please file issues for bugs that you find, I appreciate your giving it a whirl :)

Yes on pins and noodles. The pins are further typed according to whether they accept a constant value that doesn't change over time (like sin vs square for an oscillator), a parameter that changes over time (like osc frequency), or a bus (a signal output from another node). Parameters can take value curves, or busses.

The primary difficulty with the ecs was it added an extra layer of indirection to everything, and a lot of complication especially in deleting things, whether it was pins, nodes, and arcs, because all of the relationships behind those things were abstracted behind the entity interface, meaning that either reverse lookups for everything are needed, kind of defeating the purpose, or, you have to exhaustively check every system for remnant components. And if you add a new component, then you have to scour all the existing code to make sure that the new component system is also being checked during deletions. It's the relationships between nodes, rather than the nodes themselves, that in my implementation, made the ecs way too much bother. Too much bother in my case means mental overhead in keeping the model in my brain without being able to see it directly in data structures.

I didn't realize musl doesn't have a wasm flavor. That's most unfortunate! There are a small amount of third party dependencies to LabSound where lack of libc would be problematic; I'm using kissfft, libsamplerate, and libnyquist (to provide loading) to name the most challenging ones that definitely rely on cstd.

A pure C implementation would have merits in sharing, but a pure zig version would have knock-on zig specific benefits.

For example, I can imagine implementing a DSL to specify an audio graph, and where comptime is used to generate inline code for both the run time audio graph, and user interface for inspectors. I can't point to an example off the top of my head where C would prevent that, but my intuition is that zig native would have deeper comptime flexibility. I might be wrong about that.

Another twist to think about, is the backend. LabSound performs its own computation, and then renders buffers which are submitted to the backends as interleaved buffers. A web-facing specialization I don't have, would be to instantiate the processing graph into wasm based audio worklets. That would remove a great deal of indirection and copying from the system, and would likely offer the best runtime performance in a browser.

leroycep commented 1 year ago

Another twist to think about, is the backend. LabSound performs its own computation, and then renders buffers which are submitted to the backends as interleaved buffers. A web-facing specialization I don't have, would be to instantiate the processing graph into wasm based audio worklets. That would remove a great deal of indirection and copying from the system, and would likely offer the best runtime performance in a browser.

Unfortunately, AudioWorklets wouldn't save you much copying. AudioWorklet's are handed an input and output buffer to read from/write to. Each AudioWorklet will have to do at least one or more copy, depending on how many inputs/outputs there are.

Using the WebAudio nodes directly wouldn't give you much benefit performance wise, except perhaps in cases where you can use the nodes provided by WebAudio itself. The only benefit of having an AudioWorklet per node would be integration between the JavaScript host and the WebAssembly module.

desttinghim commented 1 year ago

Oh, BTW, the zig code being discussed lives in the https://github.com/machlibs/synth repository. I realized I hadn't linked it anywhere yet.

Yes on pins and noodles. The pins are further typed according to whether they accept a constant value that doesn't change over time (like sin vs square for an oscillator), a parameter that changes over time (like osc frequency), or a bus (a signal output from another node). Parameters can take value curves, or busses.

This makes sense, and is definitely a more sensible design than my current one. So pins replace AudioParameters in LabSound?

The primary difficulty with the ecs was it added an extra layer of indirection to everything, and a lot of complication especially in deleting things, whether it was pins, nodes, and arcs, because all of the relationships behind those things were abstracted behind the entity interface, meaning that either reverse lookups for everything are needed, kind of defeating the purpose, or, you have to exhaustively check every system for remnant components. And if you add a new component, then you have to scour all the existing code to make sure that the new component system is also being checked during deletions. It's the relationships between nodes, rather than the nodes themselves, that in my implementation, made the ecs way too much bother. Too much bother in my case means mental overhead in keeping the model in my brain without being able to see it directly in data structures.

From what I understand, Flecs has a concept of relationships that may make this sort of code less tedious, but I haven't worked with Flecs enough to say. Flecs has inspired mach ecs

I didn't realize musl doesn't have a wasm flavor. That's most unfortunate! There are a small amount of third party dependencies to LabSound where lack of libc would be problematic; I'm using kissfft, libsamplerate, and libnyquist (to provide loading) to name the most challenging ones that definitely rely on cstd.

Yeah, the lack of a libc for wasm makes reusing C code harder harder when working in Zig. There is a proposal to allow linking with the emscriptem SDK, but it's not gone beyond discussion at this point.

A pure C implementation would have merits in sharing, but a pure zig version would have knock-on zig specific benefits.

For example, I can imagine implementing a DSL to specify an audio graph, and where comptime is used to generate inline code for both the run time audio graph, and user interface for inspectors. I can't point to an example off the top of my head where C would prevent that, but my intuition is that zig native would have deeper comptime flexibility. I might be wrong about that.

This is definitely a possibility! It's not the only benefit of using Zig though. One concrete and immediate benefit of Zig is the convention of passing allocators instead of using a global one. This makes it extremely easy to reuse code, even in memory constrained environments, because the choice of allocator can be made by the user of the API instead of the implementer.

meshula commented 1 year ago

re ~ pins as parameters

Pins are like entities in an ecs, but they carry type information. Under the hood they route to conventional param or settings object. but the detailed interface of the param/settings are not exposed, because the detailed interface is oriented at the engine implementation, not end users.

flecs has added relationships for sure! I had started migrating from entt to flecs actually, but I decided to move in the direction of fewer dependencies, just because I wasn't using enough fo the ECS metaphor to justify the inclusion. Your mileage may vary ;)

In ls2, I'm borrowing the zig concept of allocators. I do a prepass to compute the buffers needed for processing, then arena allocate them. zig's benefitting my c coding.