w3c / webrtc-encoded-transform

WebRTC Encoded Transform
https://w3c.github.io/webrtc-encoded-transform/
Other
122 stars 26 forks source link

Expose RTCEncodedAudioFrame interface in AudioWorklets #226

Open tonyherre opened 4 months ago

tonyherre commented 4 months ago

Working with encoded frames from worklets, particularly RTCEncodedAudioFrames from AudioWorklets, would be very useful for apps, allowing them to choose the best execution environment for encoded media processing, beyond just Window and DedicatedWorker.

Readable and WritableStreams already have Exposed=Worklet, so transferring the streams of encoded frames would make sense and allow more performant implementations than eg requiring apps to copy data & metadata in a DedicatedWorker before going to the worklet / after returning from it.

I propose we add Worklet to the Exposed lists for RTCEncodedVideoFrame and RTCEncodedAudioFrame, and likely follow up with similar changes for the interfaces in the webcodecs spec.

CC @alvestrand @aboba @guidou @youennf

youennf commented 4 months ago

In typical WebRTC applications, there is a thread for audio capture/rendering and there are threads dedicated to networking, media handling. The former (which maps to AudioWorklet) is usually higher priority than the latter (which map to DedicatedWorkers).

I am not sure allowing to do encoding/networking in AudioWorklet is a good idea. For instance webcodecs construct are not supported in worklets. WebRTC encoded transform streams are transferable but RTCRtpScriptTransformer is not.

CC @padenot and @jan-ivar.

padenot commented 4 months ago

For video, just no.

Decoding audio on real-time threads has been seen in very specific scenarios, and can be done safely, if the decoder is real-time safe, etc. all the usual stuff. Encoding, I've never seen it and I don't really know how useful it would be or what it would bring.

The Web Codecs API however won't work well (or at all) in AudioWorklet, because the AudioWorklet is inherently and by necessity a synchronous environment, and the Web Codecs API an asynchronous API. I had proposed a synchronous API for Web Codecs, and explained why (in https://github.com/w3c/webcodecs/issues/19), but we haven't done it.

I side with @youennf on this. Communicating with an AudioWorkletProcessor is not hard and can easily be done extremely efficiently. Any claim of being able to do a "more performant" implementation need to be backed by something.

Once we have apps for which the limiting factor is the packetization latency or something in that area, we can revisit.

tonyherre commented 4 months ago

WebRTC encoded transform streams are transferable but RTCRtpScriptTransformer is not.

The RTCRtpScriptTransformer.readable is transferable, so could be posted to a worklet within the current shape.

The former usecase @youennf mentioned - decoding+rendering Audio - is indeed the one I'm interested in getting on a worklet. IIUC libwebrtc does its audio decoding in the realtime thread, just before rendering, so that concept isn't all that wild.

Any claim of being able to do a "more performant" implementation need to be backed by something.

Transferring the readablestream to the worklet would mean frames could be delivered directly there. Requiring JS work to be done elsewhere first would necessitate visiting another JS thread, scheduling an event there etc, so ~double the overhead plus the cost of allocating the intermediate objects to be re-transferred. I can see if I can get some more concrete numbers, but there's clearly additional work needed to be done by app+UA which could be skipped with this.

jan-ivar commented 4 months ago

The RTCRtpScriptTransformer.readable is transferable, so could be posted to a worklet within the current shape.

This produces a "readable side in another realm", where the original realm feeding that readable is the dedicated worker provided by the webpage, at least in current implementations of that surface.

requiring apps to copy data & metadata in a DedicatedWorker before going to the worklet / after returning from it.

Can't the app just transfer the frame.data ArrayBuffer instead? i.e. not a copy. It'd be interesting to see the numbers.

Transferring the readablestream to the worklet would mean frames could be delivered directly there.

This sounds like https://github.com/whatwg/streams/issues/1124.

jan-ivar commented 4 months ago

IIUC libwebrtc does its audio decoding in the realtime thread, just before rendering, so that concept isn't all that wild.

I wasn't aware. Is this a special-case for element.srcObject = new MediaStream([transceiver.receiver.track]);?

padenot commented 3 months ago

I wasn't aware. Is this a special-case for element.srcObject = new MediaStream([transceiver.receiver.track]);?

It's an implementation concern, script doesn't know about it.

Orphis commented 2 months ago

Would you be open to just reframe the issue to exposing RTCEncodedAudioFrame to an AudioWorklet context?

padenot commented 2 months ago

If we want to do decoding in real-time threads, in the AudioWorkletGlobalScope, here are some rough steps:

Orphis commented 2 months ago

I don't think this should necessarily be tied to WebCodecs. While they can be useful in some cases, they are not going to be covering all use cases or experimental codec work that is inherently implemented in JS / WASM.

padenot commented 2 months ago

If you're not using Web Codecs, there's not benefit to exposing RTCEncodedAudioFrame. Just extract the data into a buffer and communicate to the AudioWorkletGlobalScope. This can be done today using postMessage or SharedArrayBuffer.

youennf commented 2 months ago

Instead of transferring stream to worklets, the alternative would be to let script transform take a worklet instead of a worker. That is probably the most straightforward approach, but still needs discussing in terms of scenarios and pros/cons.

Orphis commented 1 month ago

@aboba Can we add this to the agenda for next week's interim?

dontcallmedom-bot commented 1 month ago

This issue was discussed in WebRTC Interim, May 21st – 21 May 2024 (Issue 226: Expose RTCEncodedAudioFrame interface in Worklets)