Audio Worklet support - Githubissues

mmontag commented 3 years ago

It would be really nice to use Audio Worklets.

ScriptProcessorNode renders audio on the UI thread and glitches during scrolling, window resize, etc. This is really not acceptable for a music player and the ScriptProcessorNode deprecation warning has showed up in the Chrome console for a long time now.

Might solve some of the glitch reports too.

It's widely supported: https://caniuse.com/mdn-api_audioworklet https://developers.google.com/web/updates/2018/06/audio-worklet-design-pattern

padenot commented 1 year ago

Hi, just found this really cool, what architecture do you prefer for this?

What I'd recommend is to run all the synths in a Web Worker, and to communicate with a wait-free ring-buffer to a very simple AudioWorkletProcessor that can do interleave/deinterleave, sample format conversion and the like.

This way, it works like your regular media file player (one thread decodes the audio, one thread plays it back), almost immune to glitches (except if the device is completely overloaded, of course).

If this is a design that would work for you, I've written some material to help:

https://blog.paul.cx/post/a-wait-free-spsc-ringbuffer-for-the-web/, some context about the problems we have on the web when it comes to real-time resilient audio. You'll probably won't learn anything new in this article
https://github.com/padenot/ringbuf.js, the library in question. One of the reasons I wrote this code was to make it easy to do the audio side of emulators and older platforms -- it's now used by quite a few projects for this purpose (but also by projects that have nothing to do with old school platforms or even audio)
https://ringbuf-js.netlify.app/ a small presentation website with easy to understand examples and the API docs

The only requirements for this to work are to serve the website with two headers:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

so that it's put in an isolated process in the web browser. This MDN link explains why this is unfortunately necessary.

Also, it would certainly be possible to keep the existing code, and to use an AudioWorkletProcessor when/if possible (recent browsers, correct header set, etc.). But as you say AudioWorkletProcessor is generally available now, and SharedArrayBuffer as well.

Also, I'd like to provide some context about the meaning of "deprecation" in the context of the Web Platform: https://lists.w3.org/Archives/Public/public-audio/2023JanMar/0003.html (tl;dr it's not going to be removed, no rush).

mmontag commented 1 year ago

Hi @padenot thanks for sharing all of this! I appreciate your insights.

More than a year ago, I attempted to use Audio Worklets and I think my approach was wrong: https://github.com/mmontag/chip-player-js/commit/043894

As I recall, it felt like I was stacking up weird hacks, and wrote at the time:

// After all this work, there were still audio glitches[...]
// The only way to avoid it is to fill a ring buffer on a *worker* thread that is also readable from
// AudioWorklet thread. And then getting into the world of shared array buffers which are still poorly
// supported, compounding the browser issues.

Ah okay; ring buffer and shared array buffers are needed, but I still have many questions.

To pick one example:

The Chip Player wasm binary relies on the Emscripten virtual file system, backed by IndexedDB. For example, some MDX music files use PDX audio sample files in the same folder. The MDX C library uses file I/O to read the PDX file. (I preload the PDX into the virtual file system with a network fetch.) How do we do Audio Worklets (or Web Workers) in this case? Where the code writing audio samples also needs IndexedDB API? In my worklet branch, I stubbed all the Emscripten filesystem code to use MEMFS instead of IDBFS. But MEMFS is no good because it does not persist across sessions.

If these questions reveal a misunderstanding on my part, please do share.

padenot commented 1 year ago

The Chip Player wasm binary relies on the Emscripten virtual file system, backed by IndexedDB. For example, some MDX music files use PDX audio sample files in the same folder. The MDX C library uses file I/O to read the PDX file. (I preload the PDX into the virtual file system with a network fetch.) How do we do Audio Worklets (or Web Workers) in this case? Where the code writing audio samples also needs IndexedDB API? In my worklet branch, I stubbed all the Emscripten filesystem code to use MEMFS instead of IDBFS. But MEMFS is no good because it does not persist across sessions.

Web Workers can use IndexedDB and make network requests normally, so this shouldn't be a problem. I think it's not a misunderstanding of your part, it's probably a lack of good documentation of the various moving parts here.

In this model, the sound generation happens in the worker, the UI is only concerned about rendering the UI, the visualization, user interaction of course, and orchestrating all of this (e.g., start loading a tune and start playback when an entry in the browser is clicked). The AudioWorkletProcessor is just going to play the audio samples generated by the worker.

If we describe a standard scenario of opening the web app and playing a tune, it would go like this (I tried to explain as many details as possible, maybe there are trivial things in there):

The web app loads.
The main thread creates an instance of the RingBuffer class, able to contains audio samples (we can configure its duration to trade memory usage again robustness when the machine is overloaded). This RingBuffer has two ends: the producing ends, where writing happens, and the consuming end, where reading happens.
It then creates an AudioWorkletProcessor, and hands off the consuming end of the ring buffer -- it then suspends the AudioContext to save resources. When and AudioContext is suspended, the process method of an AudioWorkletProcessor is not called, and everything is more or less in idle state
It then creates a Web Worker containing all the WASM stuff, this is all instantiated and set up very much like it's done on the main thread currently. In particular, it can use IndexedDB and fetch. The main thread hands off the producing end of the ring buffer.
The user clicks a tune to play it -- the main thread sends a message via postMessage to the Web Worker with information to play the tune, and resumes the AudioContext.
The AudioWorkletProcessor's process method starts being called. In this method, it checks if there's any audio samples in the consuming end of the ring buffer. If there is none, it returns true. This plays silence out, but by returning true, this method will be called again.
The worker thread fetches the various resources that it needs and prepares playback, potentially using fetch, potentially using IndexedDB, via Emscripten facilities -- this is all very similar to the current architecture, but in a worker
The worker inspects the ring buffer, sees it is empty, and starts to produce audio samples to start filling it up
At this point, without any explicit communication, the AudioWorkletProcessor notices that there are samples to play out in the ring buffer, and plays them out, by popping them from the ring buffer into its output buffer argument
The worker continuously checks if there's a need to produce more audio samples by looking at how much empty space there is in the ring buffer -- whenever the number of samples goes below a configurable threshold, it produces more samples, and writes them out. In a playback scenario like this, we can imagine buffering a few hundred milliseconds of audio at a time
If at any point the main thread becomes unresponsive (you mentioned resizing the window, but it could be because we're loading a big folder, or something), the Web Worker and the AudioWorklet are still going to be called on time, they're not affected by the main thread load. Besides, the thread on which the AudioWorkletProcessor runs has the highest scheduling priority at the OS level, so it pre-empts everything to ensure a smooth playback

An alternative approach, and that look like what you've tried, is to do the sound generation within the AudioWorkletProcessor. This would be a preferred approach if we're doing a clean-sheet design, e.g. writing a new synthesizer, with complete control on how IOs are made, and we can preload everything ahead of time. This is because the AudioWorkletProcessor, by design, can only do real-time safe operations, very much like in native code.

Here, because we're using a piece of code that already does everything (IO, sound synthesis, etc., intermixed), we need to resort to running the code normally in a worker, and then playing the audio out -- but we can move everything out of the main thread to make the app very robust against load. The same architecture is used when running e.g. emulators on the web, or other piece of code where the separation between real-time digital signal processing code and everything else is not clear, maybe because back in the days, it was all single-thread in one big run loop.

In short, three pieces:

The main thread does UI stuff and all the coordination (play/pause/load/etc.)
The Worker does the heavy lifting -- fetching resources, doing "file IOs" via IndexedDB, actually generating the samples
The AudioWorkletProcessor plays the audio out

mmontag commented 3 months ago

@padenot I just wanted to say thanks again for the writeup, and I haven't forgotten about this.

mmontag / chip-player-js

Audio Worklet support #81