Open mmontag opened 3 years ago
Hi, just found this really cool, what architecture do you prefer for this?
What I'd recommend is to run all the synths in a Web Worker, and to communicate with a wait-free ring-buffer to a very simple AudioWorkletProcessor
that can do interleave/deinterleave, sample format conversion and the like.
This way, it works like your regular media file player (one thread decodes the audio, one thread plays it back), almost immune to glitches (except if the device is completely overloaded, of course).
If this is a design that would work for you, I've written some material to help:
The only requirements for this to work are to serve the website with two headers:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
so that it's put in an isolated process in the web browser. This MDN link explains why this is unfortunately necessary.
Also, it would certainly be possible to keep the existing code, and to use an AudioWorkletProcessor
when/if possible (recent browsers, correct header set, etc.). But as you say AudioWorkletProcessor
is generally available now, and SharedArrayBuffer
as well.
Also, I'd like to provide some context about the meaning of "deprecation" in the context of the Web Platform: https://lists.w3.org/Archives/Public/public-audio/2023JanMar/0003.html (tl;dr it's not going to be removed, no rush).
Hi @padenot thanks for sharing all of this! I appreciate your insights.
More than a year ago, I attempted to use Audio Worklets and I think my approach was wrong: https://github.com/mmontag/chip-player-js/commit/043894
As I recall, it felt like I was stacking up weird hacks, and wrote at the time:
// After all this work, there were still audio glitches[...]
// The only way to avoid it is to fill a ring buffer on a *worker* thread that is also readable from
// AudioWorklet thread. And then getting into the world of shared array buffers which are still poorly
// supported, compounding the browser issues.
Ah okay; ring buffer and shared array buffers are needed, but I still have many questions.
To pick one example:
The Chip Player wasm binary relies on the Emscripten virtual file system, backed by IndexedDB. For example, some MDX music files use PDX audio sample files in the same folder. The MDX C library uses file I/O to read the PDX file. (I preload the PDX into the virtual file system with a network fetch.) How do we do Audio Worklets (or Web Workers) in this case? Where the code writing audio samples also needs IndexedDB API? In my worklet branch, I stubbed all the Emscripten filesystem code to use MEMFS instead of IDBFS. But MEMFS is no good because it does not persist across sessions.
If these questions reveal a misunderstanding on my part, please do share.
The Chip Player wasm binary relies on the Emscripten virtual file system, backed by IndexedDB. For example, some MDX music files use PDX audio sample files in the same folder. The MDX C library uses file I/O to read the PDX file. (I preload the PDX into the virtual file system with a network fetch.) How do we do Audio Worklets (or Web Workers) in this case? Where the code writing audio samples also needs IndexedDB API? In my worklet branch, I stubbed all the Emscripten filesystem code to use MEMFS instead of IDBFS. But MEMFS is no good because it does not persist across sessions.
Web Workers can use IndexedDB and make network requests normally, so this shouldn't be a problem. I think it's not a misunderstanding of your part, it's probably a lack of good documentation of the various moving parts here.
In this model, the sound generation happens in the worker, the UI is only concerned about rendering the UI, the visualization, user interaction of course, and orchestrating all of this (e.g., start loading a tune and start playback when an entry in the browser is clicked). The AudioWorkletProcessor
is just going to play the audio samples generated by the worker.
If we describe a standard scenario of opening the web app and playing a tune, it would go like this (I tried to explain as many details as possible, maybe there are trivial things in there):
RingBuffer
class, able to contains audio samples (we can configure its duration to trade memory usage again robustness when the machine is overloaded). This RingBuffer
has two ends: the producing ends, where writing happens, and the consuming end, where reading happens.AudioWorkletProcessor
, and hands off the consuming end of the ring buffer -- it then suspends the AudioContext
to save resources. When and AudioContext
is suspended, the process
method of an AudioWorkletProcessor
is not called, and everything is more or less in idle stateIndexedDB
and fetch
. The main thread hands off the producing end of the ring buffer.postMessage
to the Web Worker with information to play the tune, and resumes the AudioContext
.AudioWorkletProcessor
's process
method starts being called. In this method, it checks if there's any audio samples in the consuming end of the ring buffer. If there is none, it returns true
. This plays silence out, but by returning true
, this method will be called again.AudioWorkletProcessor
notices that there are samples to play out in the ring buffer, and plays them out, by popping them from the ring buffer into its output buffer argumentAudioWorklet
are still going to be called on time, they're not affected by the main thread load. Besides, the thread on which the AudioWorkletProcessor
runs has the highest scheduling priority at the OS level, so it pre-empts everything to ensure a smooth playback An alternative approach, and that look like what you've tried, is to do the sound generation within the AudioWorkletProcessor
. This would be a preferred approach if we're doing a clean-sheet design, e.g. writing a new synthesizer, with complete control on how IOs are made, and we can preload everything ahead of time. This is because the AudioWorkletProcessor
, by design, can only do real-time safe operations, very much like in native code.
Here, because we're using a piece of code that already does everything (IO, sound synthesis, etc., intermixed), we need to resort to running the code normally in a worker, and then playing the audio out -- but we can move everything out of the main thread to make the app very robust against load. The same architecture is used when running e.g. emulators on the web, or other piece of code where the separation between real-time digital signal processing code and everything else is not clear, maybe because back in the days, it was all single-thread in one big run loop.
In short, three pieces:
AudioWorkletProcessor
plays the audio out@padenot I just wanted to say thanks again for the writeup, and I haven't forgotten about this.
It would be really nice to use Audio Worklets.
ScriptProcessorNode renders audio on the UI thread and glitches during scrolling, window resize, etc. This is really not acceptable for a music player and the ScriptProcessorNode deprecation warning has showed up in the Chrome console for a long time now.
Might solve some of the glitch reports too.
It's widely supported: https://caniuse.com/mdn-api_audioworklet https://developers.google.com/web/updates/2018/06/audio-worklet-design-pattern