Open alvestrand opened 3 years ago
I am all in for exploring this, but what would be the benefits of using the webcodecs codecs vs the built in webrtc ones?
As a first step, I think I could be fairly easy to expose the internal codecs used in webrtc with the same interface as the webcodecs ones on the rtpsender and rtpreceiver (that is, without being able to set them externally)
Advantage would be to allow WebRTC to utilize new WebCodecs features. As an example, per-frame QP and H.264/AVC with temporal scalability are supported in WebCodecs, but not in WebRTC. WebCodecs currently has support for VP8, VP9, H.264, HEVC (decoder only), AV1 and Opus. So there are only a few WebRTC codecs missing (e.g. G.711, G.722, etc.).
It seems to me we should explore this in webrtc-extensions, this seems orthogonal to webrtc-encoded transform.
Got clarification in editor's meeting that the idea here would be to tell sender to not encode, and leave it to the JS transform to use WebCodecs for this instead. Same for receiver and decode.
I think this API would be relevant to the E2E encryption use case as well as other use cases (e.g. AR/VR).
The trickiest part will be the interaction of WebCodecs encoder rate control and WebRTC congestion control. While WebCodecs has an average bitrate target (which it can undershoot or overshoot in the short-term), it also has the ability to support SVC and/or simulcast, which allows the sender rate to be adjusted quickly (e.g. by dropping or adding layers).
For this to be used effectively, the application needs to know when the potential sending rate drops (due to loss or increased delay) as well as when the potential sending rate increases to the point where it might be possible to add back a simulcast or SVC layer that was previously dropped. WebRTC's RTP transport is unique in supporting the latter scenario; transports such as RTCDataChannel
or WebTransport
do not support "probing" so as to allow faster rampup; they intrinsically optimize more for quality (e.g. a video upload scenario) than latency (video conferencing).
I think it would be interesting to know an use case for this. What benefits does a developer get by using the webcodec codec instead of the webrtc built it?
Missing H264/AVC with temporal scalability is an implementation issue of the browsers, but it is fully supported by the webrtc API. On the other side, we won't be able to support a WebCodec codec (let's say AAC) that the webrtc stack do not support as it has to be able to do the RTP packetization, and as you say, the bitrate control of webcodec is much loose than the required in webrtc. So I fail to see any advantage to it.
@murillo128 Good point about clarifying the use cases. Some that come to mind are:
Bring your own codec (BYOC). As you note, this would require control over packetization, as well as the ability to pass negotiated parameters to the new codec (via SDP or some other means). It also would probably require the ability to influence the content of RTP header extensions (e.g. client-mixer audio level).
AR/VR. To enable addition of substantial metadata along with the media, the application would need to be able to control packetization and also possibly to interact with congestion control (so as to allow it to take the metadata addition into account). This use case wouldn't necessarily require the ability to pass new parameters. As with the E2E encryption use case, if appropriate RTP header extensions are negotiated with the SFU, the SFU will not need to parse the modified payload, so it doesn't have to care whether metadata has been added to the payload.
Bring your own codec (BYOC)
:elephant: in the room 1/ encoded-transform preserves the payload type. It can not change it (arguably that might be interesting) 2/ The SDP won't reflect the additional payload type you should be allocate and negotiate for this I don't think encoded-transform is the right API for BYOC
Regarding BYOC if it is for adding a codec not supported by webrtc implementation on the browser, I agree with @fippo, however, it could be possible to Bring Your Own Encoder/Decoder (BYOED) for a codec know by webtc as it will handle the sdp negotiation and RTP (de)paquetization.
Regarding AR/VR metadata, using webcodecs or the native webrtc codecs should not make any difference and we should be able to provide an API/solution that works in both cases (in fact insertable streams provides a hacky way of doing so).
I wanted to push back on the OP that JS should be involved to circumvent a browser having implemented a codec at one door but not the other. Browsers should either have a good reason for this (making circumvention unappealing), or expose the same codecs where they are useful and work well. This seems like an issue internal to browsers.
This seems to leave BYOED as a use case?
Not sure I understand the argument for BYOED (as distinct from BYOC). It's difficult to outperform native encoders and decoders in WASM, and if it were to be demonstrated it would probably represent a performance bug in the native implementation.
It might be desirable to add features to an existing codec, or extend it. For example, to take the output of a native H.264/AVC encoder and produce an H.264/SVC bitstream for custom packetization. But this is closer to payload manipulation than BYOED or BYOC.
I remember Matthew Kaufmann being very proud of a tweaked H.264 encoder for Skype that "focused the bits in the right places" - enhancing encoding of faces and spending fewer bits on backgrounds. That's the kind of thing you can do in (special purpose) SW not (general purpose) HW or browser-embedded SW.
@alvestrand ^this, plus an ability to configure the codec parameters dynamically
Use WebCodecs with WebRTC is useful, We use H264 by default, there are some bad case when use screenshare with webrtc's internal h264.
Given the current state of WebCodecs I find all the use cases above not possible to be achieved with WebCodecs due to it's limited configuration options (intra frame, bitrate, latency mode and scalability mode)
A different thing is to be able to add your own WASM encoder/decoder, but that would a completely different history than integrating WebRTC with Webcodecs.
@murillo128 I opened Issue 478 relating to coding tool configuration options (mostly thinking about AV1, but it applies to other codecs as well). Can you chime in there (or open your own Issue)?
@alvestrand ML-enhanced encoding is definitely a hot topic today. However, WebCodecs API doesn't help with this and WASM performance is too slow.
WRT rate control, I've just proposed PR #207 which may solve "how to get the information". Not clear how this will translate into WebCodecs parameters, however.
One of the use cases for encoded streams is to connect a WebCodec to the RTCPeerConnection instead of using WebRTC's built-in codecs.
We should have APIs that allow this to happen in a reasonably comprehensible way; this includes ensuring that a WebCodec encoder / decoder is configured with the same media types and parameters as that expected / produced by the RTCPeerConnection's track.