Open kixelated opened 1 year ago
i recommend adopting the Flash Player "Tin-Can" timing model:
as an example, tcaudio.js
+ tcaudioprocessor.js
in my RTWebSocket repo implement a Tin-Can-like interface* for WebAudio playback (with a volume control even), and tcmedia.js
uses WebCodecs to decode video & audio and play them back synchronized (including if audio frames are dropped), with adjustable buffering and jitter adaptation, using tcaudio.
the only tricky part is actually caused by WebCodecs: at least the last time i checked in the Chrome WebCodecs implementation, even though the documentation says that the timestamp is supposed to be carried through unmodified between feeding in a coded frame to an audio decoder and getting the decoded raw samples out, it actually isn't; instead the timestamps on the decoded frames are just incremented by the duration of each decoded frame starting from the last decoder init/flush, so if any frames are missing, the timestamps of the decoded samples will be out of sync and wrong. this requires some janky heuristics to work around. you can't flush the decoder after every frame because (for stateful codecs like AAC) that causes an audio glitch.
* with some of your favorites like bufferTime
, bufferLength
, currentTime
(for NetStream.time
), and status events like NetStream.Buffer.Full
, NetStream.Buffer.Empty
, and NetStream.Buffer.Flush
.
@zenomt How would you deal with clock drift without something like NetEQ? Playing audio at wall clock seems reasonable but it can't compensate for the fact that 20 ms on my maschine are not 20 ms on your machine. You have to stretch and shrink audio using something like NetEQ. Or do I miss something?
my implementation handles jitter and clock drift the same way.:
if you wanted to be extra fancy (my implementation doesn't do this), then to stave off an underrun, you could double some% of audio frames if the minimum buffered amount falls below a safety threshold.
for codecs like AAC and Opus, dropping or doubling audio frames isn't tremendously disruptive as long as the "some" percent is small (like less than 1%).
unfortunately, the Chrome implementation of the WebCodecs decoder doesn't carry through the timestamp on every coded audio frame, so dropping or doubling a frame will disrupt the timestamps on the output of the decoder. the workaround is to flush the decoder whenever you know you're dropping (or doubling) a coded frame, which will cause a little "pop" but reset/resync the timestamps on the decoded frames.
Thanks for going into details.
@chrisprobst (and Luke): you can see how the above works in my implementation here:
@zenomt I have trouble understanding how buffering solves the underrun issue effectively. With clock drift, every audio packet would play too fast and because audio is inherently linked to video, there would be essentially every audio packet an underrun or during avsync a short forced silence. I believe for smooth playout you really need something like NetEQ.
most computer audio hardware is reasonably accurate, and pretty stable (if it was unstable you'd hear that easily, and if the sample rate was way off the pitch would be noticeably wrong). i've observed sample rates around 48008/s for a nominal rate of 48000/s (about 1.7 parts in 10000). for AAC and a nominal sample rate of 48000/s, each AAC frame (1024 samples) is 21.3 ms long. at the quantum of "AAC frame", and assuming a "very live" setting with an instantaneous resume (rather than accumulating a longer buffer), you'd underrun one frame time every 125 seconds (2 minutes), and that underrun would only be 21.3 ms of silence. if you accumulated a longer buffer when rebuffering, the instance of underruns requiring a rebuffer would be proportionally longer. for a half-second buffer and the above drift, you'd need to rebuffer (for half a second) every 49 minutes.
Hey @kixelated,
I'd love to enable audio in this library and implement a decent form of A/V sync to make it more usable. What's the first thing I should look at, or where should I start? Any pointers would be greatly appreciated. :)
I think the audio worklet needs a rewrite, but I don't know the precise problem. The current approach of using ordered Streams kinda sucks and results in blindly queuing data with no expectation of when it will (and should) be played.
I have been down a rabbit hole, turns out; web audio is hard. First goal is to get any kind of audio out even if it's noise as all we got right now is silence. Looking at the code here, it seems we only send the video canvas and never forward the audio channels?
I also encounter this warning after I run the code locally and try watching a published stream:
The AudioContext was not allowed to start. It must be resumed (or created) after a user gesture on the page.
Apparently could be fixed by making sure getAudioContext().resume() is called somewhere similar to what you did with watch
here but I'm not sure where exactly that fits in the publisher.
gentle reminder that i posted links to running code using WebAudio, as well as a suggestion on a timing model for A/V sync, above in this Issue back in october 2023.
I disabled audio because I was getting tired of working on the player. It needs to be synchronized with video, which is actually kind of annoying thanks to WebAudio.