Open sprangerik opened 2 months ago
Thank you for proposing a session!
You may update the session description as needed and at any time before the meeting, but please keep in mind that tooling relies on issue formatting: follow the instructions and leave all headings and other formatting intact in particular. Bots and W3C meeting organizers may also update the description, to fix formatting issues or add links and other relevant information. Please do not revert these changes. Feel free to use comments to raise questions.
Do not expect formal approval; W3C meeting organizers endeavor to schedule all proposed sessions that are in scope for a breakout. Actual scheduling should take place shortly before the meeting.
Session description
WebCodecs provides a low-level API to do encoding and decoding of video with control over settings on a per-frame basis. As a relatively young API it currently lacks some more advanced features, such as temporal/spatial scalability that are important for real-time use cases like video conferencing.
This session is intended to discuss a number of potential next steps, to find which features are highest priority, and what benefits or problems we face with each of those.
Some of the topics for discussion:
Explicit reference frame control
By allowing the user to specify which reference buffers to reference and which to update on a per-frame basis, it is possible to implement a number of important reference structures and coding features including temporal/spatial/quality layers, long-term references, low-latency 2-pass rate control, etc.
In short, any of the scalability modes listed in Scalable Video Coding (SVC) Extension for WebRTC, any many more can be implemented with a small set of tools. If done right, this could even be done in a manner that is codec and implementation agnostic.
This way of modeling an encoder does also present some issues. The user needs to be able to determine how many reference buffers are available, how many can be referenced per frame and know which references are allowed or disallowed based on various circumstances. How do we expose such data in a way that is both user friendly, compatible with the current API, and avoids unnecessary finger printing surfaces?
There are also tradeoffs when it comes to integrating with existing encoder implementations, a small subset of which may not fit well into this model.
Spatial/Quality Scalability
Spatial scalability can be achieved by changing the
encode
call to take a sequence of encoding options, instead of a single option, per input frame. Each option would then represent a different layer and would include a desired encoded resolution. With reference frame scaling, a user may reference a buffer containing a different resolution.Again, this comes with some challenges. Different codec types might have different bounds on the scaling factors, and even certain implementations have limitations in this regard - if it is supported at all. Some codecs allow only reference frame scaling within the same temporal unit, while other support any reference at any time. How do we handle encoders with special optimized mode such as "multi-res" or "S-mode aware" encoding?
Rate Control
When dealing with layered encoding, rate control becomes much more involved. The easiest way is to just support CQP, putting all of the rate control control with the user. If CBR is desired, the encoder needs to understand the bitrate target and expected frame rate for each spatio-temporal layer, this means it suddenly needs to be SVC aware even if the user is doing all of the reference frame control.
Auxiliary
There are many other knobs that could potentially be added. Speed/Quality control, segmentation/ROI-mapping, etc What's on the wish-list of the community?
Other Sessions of Interest
Note that there will also be a first-step proposal discussed at the joint Media/WebRTC WG Meeting on the 26th.
Further, there is a proposed breakout session on RtpTransport, an API that allows users to send custom-encoded frames over the RTP channel of a
PeerConnection
and is intended to go hand-in-hand with WebCodecs.Session goal
Find the highest priority features in the community, and what aspects needs more consideration
Additional session chairs (Optional)
@Djuffin
Who can attend
Anyone may attend (Default)
IRC channel (Optional)
evolved-webcodecs
Other sessions where we should avoid scheduling conflicts (Optional)
13
Instructions for meeting planners (Optional)
No response
Agenda for the meeting.
The agenda is to discuss the proposal to add reference frame control to WebCodecs, and gather feedback and comments on the path forward. The session consist of a few parts:
See also WebCodecs spec and github issue for the reference control.
Links to calendar
Meeting materials