w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
987 stars 136 forks source link

Dealing with sample rates for WebCodecs + WebAudio #378

Open chcunningham opened 2 years ago

chcunningham commented 2 years ago

I'm summarizing a discussion w/ @hoch and adding some new questions at the bottom.

The underlying question: What guidance should we give to WebCodecs : WebAudio users on setting AudioContext.sampleRate? The answer so far is: it depends.

Background:

So what should WebCodecs users do...

Simple use cases (e.g. playing occasional short sounds) may want to render audio via a single AudioBuffer -> AudioBufferSourceNode. In this case, take the default AudioContext sampleRate. If your AudioBuffer's rate doesn't match your AudioContext, WebAudio will silently resample the buffer for you. You could instead set the AudioContext sampleRate to match your buffer, but this is at best the same amount of conversion and could actually be much more conversion if you're using that AudioContext for other things that wouldn't otherwise require it.

More involved cases may render via the SAB -> AudioWorklet pattern. AudioWorklet expects the sample rate of the AudioContext (default or otherwise). In this case, either (1) construct AudioContext with their desired sample rate, or (2) take care to do external SRC before passing to WebAudio.

@hoch @padenot - can you help me explore that last case more? What are the drawbacks to (1)? Are users likely to hit complexity reconstructing AudioContext for changes to sample rate? Is external resampling (JS or WASM) fairly common already? Should an API for sample rate conversion be considered (some prior art in https://github.com/WebAudio/web-audio-api/issues/118 and https://github.com/WebAudio/web-audio-api/issues/2398)?

padenot commented 2 years ago

What are the drawbacks to (1)?

None that I know of. If audio isn't at the primary rate of the system, one of two things can happen:

  1. Either there is only one audio stream playing on the output device and the OS can open the hardware at this specific rate, skipping SRC
  2. Or (a lot more likely) the OS or the browser will perform sample-rate conversion

In any case, 2. adds a few milliseconds of latency if the SRC is to be of acceptable quality, but this is also the case when the author performs the resampling, or when using an HTMLMediaElement. A resampler's API will have to include a way for authors to choose what is best for them.

Are users likely to hit complexity reconstructing AudioContext for changes to sample rate?

I don't think so, but it's more expensive in terms of system resources, because more real-time threads might be running (depending on the implementation). Depending on the application and the OS, it might be best for authors to do the resampling (when lots of audio streams of different rates are being played at once), or it might be best to create the AudioContext with the rate directly (when only one stream is being played back). If it's just to discover the rate, it's no problem: the AudioContext is likely to no even start processing audio, and can be immediately closeed, immediately freeing resources (that might in fact not even be allocated if the implementation is lazy).

Is external resampling (JS or WASM) fairly common already? Should an API for sample rate conversion be considered (some prior art in WebAudio/web-audio-api#118 and WebAudio/web-audio-api#2398)?

It has existed for some time, and it's not particularly hard. The preferred approach, as it's often the case for this kind of code, is to compile a good resampler to WASM. For example, this exists: https://github.com/geekuillaume/node-speex-resampler. It's the resampler used in a variety of software: modified for use in the Opus codec, used with a few patches in various parts of Firefox, but also lots of others.

A resampler API needs to be a bit complex to be useful, it's best to demonstrate that a WASM approach is not fast enough or that specifying the AudioContext's rate is not practical before considering doing it.