w3c / webrtc-rtptransport

Repository for the RTPTransport specification of the WebRTC Working Group
Other
18 stars 6 forks source link

RTCRtpSender.replaceSendStreams() and simulcast issues #64

Open Orphis opened 1 month ago

Orphis commented 1 month ago

Currently, the API document lists:

partial interface RTCRtpSender {
  Promise<sequence<RTCRtpSendStream>> replaceSendStreams();
}

How is this supposed to work with more advanced simulcast usage? When we have rollbacks and renegotations that can change the simulcast envelope, the number of rids is going to vary, and since our RTCRtpSendStream objects are tied to a single mid/rid pair, this will be difficult to work properly. What would happen to a send stream if its rid has been negotiated away?

We could add events to get notified when we have renegotiated to be able to get newer RTCRtpSendStream objects, but I think that's going to be very clunky, especially if we want to transfer them to DedicatedWorkers.

An alternative would be to have a single RTCRtpSendStream object per sender (or mid) and each sendRtp() call will have the rid as a parameter (which could be required if multiple simulcast layers have been negotiated), possibly in the RTCRtpPacketInit dictionary.

rid would still be validated against the list of known values that have been negotiated, and we shouldn't be losing in capabilities.

pthatcher commented 1 month ago

You're right that there is a problem of "what happens if one uses SDP to add/remove RIDs after calling replaceSendStreams()"?. I'd be tempted to say that you can call replaceSendStreams again and get back a new set, because that would be pretty simple.

As for complex scenarios: perhaps our goal should be to use RtpTransport.createRtpSendStream() instead of using replaceSendStreams() and not rely on SDP so much.

Orphis commented 1 month ago

The problem with that approach is that users will probably transfer the send streams to a dedicated worker. What happens when you call it again then?

If we scope a send stream to a mid and do not support unsignaled media, it should be relatively easy enough to manage.

aboba commented 1 month ago

@orphis @pthatcher I think there may be two different scenarios here:

  1. WebRTC encoder/decoder used for simulcast with SDP to add/remove RIDs
  2. WebCodecs encoder/decoder used (e.g. for scenarios not supported in WebRTC such as mixed-codec simulcast, or spatial scalability with layer refresh, made possible by Erik's proposed encoder API.

In scenario 1, the browser will add/drop simulcast layers. In scenario 2, the application will drop/add layers, based on the BWE and stats, and to avoid confusion on the SFM, will need to properly set RTP header extensions like the VLA and DD.

Over time, scenario 2 might get more sophisticated such as support spatial scalability, custom RTCP such as LRR.

Orphis commented 1 month ago

For scenario 2, you could have something like:

let t = pc.addTransceiver('video');
await negotiate();
let sendStream = await t.sender.replaceSendStream();
while (true) {
  let packet = nextPacketFromWebCodecs();
  sendStream.sendRtp(packet);
}

There, the media is signaled, the WebCodec will produce frames which are packetized and then sent, without involving the traditional WebRTC media pipeline. The application layer is free to not use any layers based on BWE or other signals (no pure H264 users? Don't send H264 to the SFU and upgrade to AV1).

If you wanted to use a custom codec, then we have spec to cover that and create custom ones (Harald's work). We could have something similar for RTP Header Extensions too and it should have about everything covered then. Or is anything missing?