Remove restriction on streams being limited to only one PC

palak8669 commented 10 months ago

Preview | Diff

dontcallmedom-bot commented 10 months ago

This issue was mentioned in WEBRTCWG-2023-09-12 (Page 48)

youennf commented 9 months ago

As discussed during yesterday's editor's call, here is some feedback for this issue.

This 1 pc restriction is similar to the restriction to preserve the ordering of chunks. It prevents potential footguns. It removes edge cases. It helps browser interoperability. It is consistent with the WebRTC media pipeline and encoded transform model. As it is currently, WebRTC encoded transform is nothing but an encoder post-processor or a decoder pre-processor. No encoder/decoder -> no processor.

For instance, if a website tries to write too much data, the UA will drop frames before encoder to enforce bandwidth allocation. The transform will not receive any new chunk until the sender gets back some allocated bandwidth slot. Another example is that the web app cannot send data through the sender transform of a recvonly transceiver. Ditto on receive side where track may be muted. These are good properties.

Removing this restriction would change the model of WebRTC encoded transform, and more generally of the WebRTC media pipeline. We should favour proposals that can implement the use cases by staying consistent with the existing model we agreed on.

AIUI, the use cases can be implemented by new APIs, located elsewhere than WebRTC encoded transform. Two examples come to mind:

A low level API that would allow building an SFU in a browser. This could be the RTCRtpTransport API presented at TPAC. My understanding from TPAC meeting is that there is interest in moving forward with this proposal.
A high level API that would allow forwarding received media (without reencoding whenever possible) via another peer connection, as mentioned by @jan-ivar. Say for instance sender.enableMediaForwarding({ ... }) or sender.replaceTrack(track, { forwarding: true, ... });. That API could be complemented by progressively adding knobs to control the exact forwarding behaviour, as we discover more needs from applications. This is inline with the current RTCPeerConnection model we all like.

Above proposals can lead to good interop and good user experience (minimum latency, minimum CPU overhead...).

alvestrand commented 9 months ago

On the other hand, those restrictions prevent solving the "late fanout" usecase, and other usecases that can't be solved under those restrictions using this API. I believe (based on implementation experience) that a frame-level API is easier to use and harder to make mistakes in than a packet-level API for many of those use cases, and that making such an API available to users is a benefit for the Web platform.

One possibility that may satisfy both our constituencies is to redefine the API; if people are willing to live with the restriction of "one input, one output", they can use the ScriptTransform API; if people desire to go outside those restrictions, they use a different means of instantiating the input and output streams of frames that are the essential parts of this API for those purposes.

youennf commented 9 months ago

if people desire to go outside those restrictions, they use a different means of instantiating the input and output streams of frames that are the essential parts of this API for those purposes.

I thought about this approach a while ago. It is cleaner in the sense that it clearly states to the UA that (focusing on sender side) the JS application is responsible to implement the source+encoder part (which late fanout is clearly about). As an example, it will help UA provide meaningful WebRTC stats. Setting a track on such a sender could throw...

Exposing such API requires us to refine the WebRTC media pipeline model as we would expose things that are fully internal right now. This might not be unrelated to the TPAC Media/WebRTC WG discussion meeting I missed.

Related to this approach, my first questions would be:

The frame level vs. packet level question. It would be great to see whether we can get consensus within the WG. For instance, if we have a packet level API, do we also need a frame level API? Do we need both?
@jan-ivar's question: should fanout be done in JS or done by UA with web app tuning it via knobs?

To refine the WebRTC media pipeline model, there are a number of questions that might be useful to tackle, for instance:

the bandwidth allocation model: WebRTC engine is providing a bitrate that the encoder needs to respect. What if it is undershooting or overshooting it?
the timing model: WebRTC engine is processing frames in a given time-based order. What if JS encoder is sending frames differently?
Get consensus on https://github.com/w3c/webrtc-encoded-transform/issues/109.
the feedback model: WebRTC engine is providing instructions on the encoder (I need a key frame) based on RTCP feedback. Should we expose this?
the error model: WebRTC engine can starve the encoder with input. What if JS is sending frames when UA does not expect it?
the overuse model: WebRTC engine is preventing excessive CPU use by decreasing frame rate and/or resolution. Should we expose this? Probably not but worth digging.
the threading model: WebRTC engine is probably currently doing this processing in background threads. Should we put the same restriction on the web?

Hope this helps moving forward.

guidou commented 8 months ago

I thought about this approach a while ago. It is cleaner in the sense that it clearly states to the UA that (focusing on sender side) the JS application is responsible to implement the source+encoder part (which late fanout is clearly about). As an example, it will help UA provide meaningful WebRTC stats. Setting a track on such a sender could throw...

Can this be addressed by passing an extra optional parameter to the RTCRtpScriptTransform constructor?

Exposing such API requires us to refine the WebRTC media pipeline model as we would expose things that are fully internal right now. This might not be unrelated to the TPAC Media/WebRTC WG discussion meeting I missed.

Harald's congestion control proposal is in line with this. https://github.com/w3c/webrtc-encoded-transform/pull/207 Also, depending on the use case this might or might not be an issue. There is a lot of value in optionally lifting the restriction to support valid use cases.

Related to this approach, my first questions would be:

The frame level vs. packet level question. It would be great to see whether we can get consensus within the WG. For instance, if we have a packet level API, do we also need a frame level API? Do we need both? @jan-ivar's question: should fanout be done in JS or done by UA with web app tuning it via knobs?

Some advantages of the frame-level approach:

The changes required to solve the problem (including glitch-free failover for multiple input PCs) at the frame level are minimal: add a setMetadata() method to RTCEncodedVideo/AudioFrame, and allow removing the 1PC restriction. We have experimented with this in Chrome (which implements the pre-standard version that doesn't have the 1PC restriction) and the results are very good in some real-world scenarios.
The programming model is a small extension to the well-understood model of encoded transform.
It makes it easier to support other transport protocols if necessary.
The concerns about exposing internal aspects of the WebRTC pipeline do not apply to all use cases and can be addressed with specific APIs for that purpose.

A packet-level API, depending on how it's made, can have the advantage of allowing forwarding to start without waiting for the whole frame, and is able to deal with packet loss directly, but has other disadvantages for this use case, such as:

It is a new model, and it is likely to be less straightforward to support glitch-free failover.
It is possible to use a frame-to-packetizer model for sending (as in [1]), in which case the forwarding advantage does not exist.
It is a major API intended to support multiple use cases, which would require plenty of time to design, build consensus, implement and deploy. It's not clear that in the end the solution for this use case will necessarily be better than the frame-based solution. It will for sure be less timely.
It is tied to RTP, which is OK in many cases, but some users might prefer a solution not tied to the transport protocol as they might consider migrating in the future to alternative protocols (e.g., Quic).

WDYT?

[1] https://docs.google.com/presentation/d/1vqlz0vbF1JFmKxVfTrhpBwomwu1aiSJz5p-Q0PXNDKA/edit#slide=id.g27b84bfbaa3_13_374)

alvestrand commented 7 months ago

Status: This PR has been languishing for 3 months now. The addition of a pure "sendonly" API is under consideration, but seems to require more time (#211), and the discussion so far has not shown a compelling difference from the interface that is effectively created if you don't attach a sending track to a transceiver.

The use case for forwarding has been accepted. No use case where the restriction serves a positive purpose (enabling an use case) has been described; the only argument put forward is that it gives the opportunity to fire an error when people try unexpected things.

We should consider again whether we should remove the restriction from the spec.

youennf commented 6 months ago

The addition of a pure "sendonly" API is under consideration

It seems this API got some 'room consensus' at last WG meeting. I'll be happy to help moving this forward.

No use case where the restriction serves a positive purpose (enabling an use case) has been described

A number of reasons for this restriction have been mentioned in this thread.

We should consider again whether we should remove the restriction from the spec.

I am not sure discussing this again will be as fruitful as making progress on the pure "send only" API.

guidou commented 2 hours ago

No longer pursuing. We will focus on the RTCRtpEncodedSource proposal.

w3c / webrtc-encoded-transform

Remove restriction on streams being limited to only one PC #201