Add description of an API for controlling SDP codec negotiation

alvestrand commented 1 year ago

Fixes #172

dontcallmedom-bot commented 1 year ago

This issue had an associated resolution in WebRTC June 2023 meeting – 27 June 2023 (Encoded Transform Codec Negotiation):

RESOLUTION: Adopt PR #186 with details to be discussed in the PR

alvestrand commented 9 months ago

Modified according to discussions at TPAC according to solution presented there to #199

PTAL.

dontcallmedom commented 9 months ago

(the remaining CI error is pending merge of #208)

Philipel-WebRTC commented 9 months ago

I have read this proposal, and I can't quite understand how it is suppose to work. AFAICT the goal is to allow negotiation of custom codecs, but we still only allow standardized packetizers to be (ab)used.

This part of the spec:

It is up to the transform to ensure that all data and metadata conforms
to the format required by the packetizer; the sender may drop frames that
do not conform.

seems extremely complex. The user needs to know at the lowest possible level of how the packetizers work, know which bits specifically they are allowed to touch or not touch for each codec, and AFAICT it would also be borderline impossible to debug if a packetizer decided to drop the frame for you.

Would something like this work instead?

customCodec = {
   // Type is an enum of {"VP8", "VP9", "H264", "custom"}
   type: “custom”,
   // postfixName only has an effect when the type is "custom". Setting the postfixName
   // to "VP8" would make the codec show up as "custom-VP8" in the SDP.
   postfixName: "VP8",
   clockRate: 90000
};

When custom is selected as type the packetization format is implied to be raw.

Now when a sender and receiver needs to negotiate a codec they can do string matching on the codec names, and assuming they are aware of how the bits are being transformed, then they know everything about the codec and the format.

alvestrand commented 9 months ago

to @Philipel-WebRTC - I don't understand your proposal. In particular:

h.264 has two packetizers, distinguished by "packetization-mode=0/1" in the fmtp string. This is part of my argument for using a full codec description to identify the packetizer.
codec names have to be unique, and if standardized names are used, they need to be used for standardized codecs
there's no particular reason why the SDP exchange should have to reveal to a signaling eavesdropper what the inner content of a nonstandard encrypting codec is

There is no standard at the moment for a raw packetization format. So we can't refer to that format from a standard. A goal of standardization is interoperability - that a depacketizer in one browser should be able to depacketize content that is packetized by another browser. So referring to an unspecified format is not a win here.

I thought about using a DOMString for "setPacketizer" rather than a codec description, and allowing implementations to put non-specified values in there. But I didn't find any particular advantage in doing so.

Philipel-WebRTC commented 9 months ago

It could be that I fundamentally misunderstand the goal of this proposal, but AFAICT the goal is to enable custom codecs to be negotiated via SDP, or am I wrong?

As soon as we negotiate a non-standardized codec name then we don't have any standardized packetizer to rely on, and I get that the proposed API would allow the sender and receiver to map the payload type of a custom codec to some other know (de)packetizer, but to me that seems very limiting and complex.

The way it is proposed right now, would we expect anyone to every use a packetizer other than VP9 (the only one that separates descriptor and bitstream data cleanly in the payload) for a custom codec? Again, this to me looks a bit more like the abuse side of things than a general solution (assuming I understand the goal of this PR).

alvestrand commented 9 months ago

At least one external WebRTC user has made this almost work by using the H.264 packetizer - he was advised that he would have an easier time if he appended the custom data as SEI (don't know the proper term for those). So no, VP9 is not the only option, but yes, VP9 is probably the best standard one at the moment.

It's always possible for implementations to define their own packetizers, but that's not useful for interoperability.

alvestrand commented 9 months ago

Actually, registering new codecs in the vnd. tree with IANA is a possibility that would allow a well defined way of using nonstandard packetizers.

alvestrand commented 9 months ago

@aboba you have a "request changes" marker. Can you re-review?

youennf commented 9 months ago

Following on yesterday's editor meeting, here are some thoughts related to this proposal. I would tend to reduce the scope of the API to what is needed now, without precluding any future extension.

AIUI, we seem to get consensus on adding support to:

Enable media agnostic RTP payload formats like SFrame.
Allow the application to select one of these RTP payload formats as part of a RTCRtpSender transform.

For 1, we assume the UA implements the RTP payload format. setCodecPreferences can be used if the UA does not negotiate it by default.

For 2, we could use payload types. Another approach, potentially more convenient for web developers, is to reuse the name of the payload format directly. This is similar to manipulating mimeType instead of payloadType.

partial interface RTCEncodedAudioFrame {
    attribute DOMString rtpPaylodFormat;
}
partial interface RTCEncodedVideoFrame {
    attribute DOMString rtpPayloadFormat;
}

On sender side, the application would set it to sframe, to make use of https://www.ietf.org/archive/id/draft-thatcher-avtcore-rtp-sframe-00.html. On receiver side, the application detects the use of sframe by checking rtpPayloadFormat.

We could also use that API internally to further define the interaction of SFrameTransform with the RTP layers.

alvestrand commented 9 months ago

Solving the SDP negotiation problem for built-in Sframe is not sufficient. We have to solve it for script transforms.

The proposal in https://github.com/w3c/webrtc-encoded-transform/pull/186#issuecomment-1761004339 will not allow us to affect SDP negotiation for script transforms; SDP negotiation is only affected by things that happen before offer/answer time - there are no frames before offer/answer.

jan-ivar commented 9 months ago

Enable media agnostic RTP payload formats like SFrame.

Do you mean any other types here, or just SFrame? Do you have an example?

For 1, we assume the UA implements the RTP payload format. setCodecPreferences can be used if the UA does not negotiate it by default.

What would that look like? Would it multiply the payload types in RTCRtpSender.getCapabilities("video")?

Allow the application to select one of these RTP payload formats as part of a RTCRtpSender transform.

Do you mean a RTCRtpScriptTransform here? What would that look like?

partial interface RTCEncodedVideoFrame {
   attribute DOMString rtpPayloadFormat;

What's the advantage or use case for choosing this per frame?

Apps can already specify SFrame ahead of negotiation, so doesn't the UA have all it needs to negotiate sframe?

const {sender} = pc.addTransceiver(track);
sender.transform = new SFrameTransform({role: "encrypt"});
// negotiate
const {codecs} = sender.getParameters();
if (!codecs.find(findSFrame)) sender.transform = null;

alvestrand commented 9 months ago

Enable media agnostic RTP payload formats like SFrame.

Do you mean any other types here, or just SFrame? Do you have an example?

You heard the SCIP (NATO) format being mentioned on the call. That's one. A payload format to replace Google's nonstandard a=packetization-mode:raw would be another.

For 1, we assume the UA implements the RTP payload format. setCodecPreferences can be used if the UA does not negotiate it by default.

What would that look like? Would it multiply the payload types in RTCRtpSender.getCapabilities("video")?

It would look like the API described in https://developer.mozilla.org/en-US/docs/Web/API/RTCRtpTransceiver/setCodecPreferences

Allow the application to select one of these RTP payload formats as part of a RTCRtpSender transform.

Do you mean a RTCRtpScriptTransform here? What would that look like?

From the explainer:

    metadata = frame.metadata();
    metadata.pt = options.payloadType;
    frame.setMetadata(metadata);

partial interface RTCEncodedVideoFrame {
   attribute DOMString rtpPayloadFormat;
What's the advantage or use case for choosing this per frame?

I believe Youenn described this case in details on the call; I added this FAQ question specifically to address his use case.

From the explainer:

Q: My application wants to send frames with multiple packetizers. How do I accomplish that?

A: Use multiple payload types. Each will be assigned a payload type. Mark each frame with the payload type they need to be packetized as.

Apps can already specify SFrame ahead of negotiation, so doesn't the UA have all it needs to negotiate sframe?

I particularly mentioned on the call (and have said in every presentation that I've given on it) that I want an interface that is able to support the existing, deployed, Javascript-based sframe format, which is not the same as the sframe transform that is supposed to use the (still not final) IETF SFrame format draft, or any other transform.

youennf commented 9 months ago

What's the advantage or use case for choosing this per frame?

Fair question.

I believe Youenn described this case in details on the call; I added this FAQ question specifically to address his use case.

From the explainer:

Q: My application wants to send frames with multiple packetizers. How do I accomplish that?

The use case I know is enable/disable SFrame dynamically. Otherwise, we just want to stick to whatever packetization is associated to the encoder media content and we can already change the media content via setParameters.

This change can be done either at the frame level or at the transform level (similarly to setParameters really). If at the transform level, you need to call sender.transform = newTransform to change whether SFrame packetization would be used or not. This is slightly less flexible but might be more convenient/less error prone to web developers and could allow some optimisations on the UA side. You loose some flexibility, except if you plan to switch packetization at a very specific frame. I am unsure whether we have a strong use case here.

It is interesting to look at receiver side though, in case a UA implements the SFrame packetization format and processing happens in a script transform. The web application might want to know:

packets were processed by the SFrame depacketizer
Which underlying media decoder will be used (info provided by the SFrame depacketizer from the RTP payload content). It might be convenient to expose an API for both things and it makes sense to expose this at the frame level, since this might change for every frame potentially.

I would tend to expose the same API receiver and sender side, hence why I would tend to go with frame level.

Solving the SDP negotiation problem for built-in Sframe is not sufficient. We have to solve it for script transforms.

That is probably where we have a disconnect. Script transforms have several potential use cases:

Implement SFrame or a variant of SFrame. Once we have the packetization format in the UA, I do not see a need for a new generic mechanism to negotiate it. I am ok with SFrame packetization format spec to expose an extension point in the SDP, and we would expose this in our API (say setCodecPreferences for instance).
Add metadata to media content. In this case, I believe we want encoded content to stick to the same format (e.g. H264 would stay H264 and metadata would be put in SEI). We do not need to change the packetization format and I do not see the need to negotiate anything in the SDP. RTP header extensions might be a more meaningful extension point if need be.
Plug a new codec. It seems best to solve this with a separate API, I do not think we want to enshrine this kind of support in encoded transforms. For instance, a video encoder takes a VideoFrame as input and EncodedVideoChunk as output. A script transform takes an EncodedVideoChunk both for input and output. Also, these two should be usable together: it makes sense to use a SFrame transform on a plugged-in codec simply with sender.transform = sframeTransform.

This plug a new codec API would probably need to let the web application handle both the encoding/decoding (UA does not know the content format) and the associated packetization (UA does not know the format so does not know the associated packetization format), in addition to SDP negotiation. Relying on the SFrame packetization for new codecs is probably not something we want, as per IETF feedback.

For this plug a new codec work, I would tend to start with a plug your own encoder API that could be used without packetization/SDP handling, for instance to let web apps fine tune a WebCodecs encoder setup (set QP per frame/macro-block for instance). We could then address new formats on top of this API to handle SDP negotiation/RTP packetization. It could also be used for the metadata to media content use case since web applications may often carry VideoFrame and associated metadata together.

alvestrand commented 9 months ago

Solving the SDP negotiation problem for built-in Sframe is not sufficient. We have to solve it for script transforms.

That is probably where we have a disconnect. Script transforms have several potential use cases:

Implement SFrame or a variant of SFrame. Once we have the packetization format in the UA, I do not see a need for a new generic mechanism to negotiate it. I am ok with SFrame packetization format spec to expose an extension point in the SDP, and we would expose this in our API (say setCodecPreferences for instance).

Specific interfaces that can only be used for the SFrame transform should go into the SFrame transform specification, not here. setCodecPreferences chooses the preference by the receiver about what the sender should send, it does not enforce either what the sender sends or what the receiver will have to handle.

setDepacketizer(PT, depacketizer) is an API that addresses the use case on the receiver side.

Add metadata to media content. In this case, I believe we want encoded content to stick to the same format (e.g. H264 would stay H264 and metadata would be put in SEI). We do not need to change the packetization format and I do not see the need to negotiate anything in the SDP. RTP header extensions might be a more meaningful extension point if need be.

We don't agree here. There is no generic metadata mechanism, and a number of codecs lack functionality for adding metadata.

Plug a new codec. It seems best to solve this with a separate API, I do not think we want to enshrine this kind of support in encoded transforms.

I don't agree here. I think we want to allow for experimentation of this kind, we should actively ensure that we make it possible to try out new codecs without changing the browser.

This plug a new codec API would probably need to let the web application handle both the encoding/decoding (UA does not know the content format) and the associated packetization (UA does not know the format so does not know the associated packetization format), in addition to SDP negotiation. Relying on the SFrame packetization for new codecs is probably not something we want, as per IETF feedback.

For this plug a new codec work, I would tend to start with a plug your own encoder API that could be used without packetization/SDP handling, for instance to let web apps fine tune a WebCodecs encoder setup (set QP per frame/macro-block for instance).

Doing this without SDP handling would make the current situation, where people send content that does not conform to the SDP describing what they claim to be sending, strictly worse. I would strongly oppose adding such an API to the PeerConnection family of APIs without solving the SDP issue first.

We could then address new formats on top of this API to handle SDP negotiation/RTP packetization. It could also be used for the metadata to media content use case since web applications may often carry VideoFrame and associated metadata together.

If we have to handle SDP negotiation and RTP packetization for this use case, and we know that we have other use cases we need to handle it for, why not accept the current proposal into the WD?

jan-ivar commented 9 months ago

Plug a new codec. It seems best to solve this with a separate API, I do not think we want to enshrine this kind of support in encoded transforms. For instance, a video encoder takes a VideoFrame as input and EncodedVideoChunk as output. A script transform takes an EncodedVideoChunk both for input and output.

I agree this suggests an API mismatch. I see evidence of this in the Lyra use case as well, which had to:

Use a special "L16" null-codec in Chrome to bypass encoding and pass raw audio directly to the transform
SDP-munge ptime back to 20, to counter Chrome's drop to 10 ms from the large amount of (uncompressed) input
Reverse the 16-bit endian of the input, which is already in network order, back to platform order for lyra-js

It's an impressive feat, but I imagine these problems would be amplified with video. It might need its own API.

Also, these two should be usable together: it makes sense to use a SFrame transform on a plugged-in codec simply with sender.transform = sframeTransform.

This makes sense to me as well. E.g. spitballing:

sender.encoder = new RTCRtpScriptEncoder(worker, options);
sender.transform = new SFrameTransform({role: "encrypt"});

?

alvestrand commented 9 months ago

My preferential API for chained transforms is readable.pipeThrough(transform1).pipeThrough(transform2).pipeTo(writable).

It is possible to write chain links that permit this syntax when using RTCRtpScriptTransformer.

youennf commented 9 months ago

If we have to handle SDP negotiation and RTP packetization for this use case, and we know that we have other use cases we need to handle it for, why not accept the current proposal into the WD?

Web applications are not really blocked by SDP negotiation since they can do all sort of adaption at JS level. I would tend to leave this particular API to the end when we have the other pieces ready.

The RTP packetization part of the proposal seems more problematic to me.

AIUI, the guidelines from the IETF is that every codec has its own packetization, except for a very few packetization formats, the only one in scope for UA for the moment being SFrame.

The proposal here seems to open the box to things like: encode as VP8, packetise as H264. This seems to go against the IETF guidelines. I am ok allowing JS to select other packetisers than the default one, for some very specific packetisers, SFrame being the only one really. Hence the idea to expose rtpPayloadFormat and to limit this change to those packetisers, at least at first.

For the new codec use case, following the IETF guidelines, plugging a JS codec would mean plugging a JS packetiser/depacketiser. But the proposal does not go there so the proposal does not fully address new codec use case.

For the video+metadata, if the packetiser can handle it, I see no missing support. If the packetiser cannot handle it due to the added metadata, it means this video+metadata is a new codec, and it would need its own packetiser.

Doing this without SDP handling would make the current situation, where people send content that does not conform to the SDP describing what they claim to be sending, strictly worse

Agreed in general. The rough solution I have in mind is to expose a codec object to JS to surface the control API without allowing JS to actually push arbitrary data. I haven't done the full exercise of how this initial API could be easily extended to fully support new codecs. My hope is that this would be one step forward to modelling the new codec JS API. We would still need to handle the packetization and SDP issues of course.

alvestrand commented 9 months ago

Trying to piece together Jan-Ivar's suggested changes one at a time:

Jan-Ivar said:

I disagree with this conclusion. I think transforms become entangled with a sender or receiver on assignment and >aren't really reusable or agnostic https://github.com/w3c/webrtc-encoded-transform/issues/209.

I think we can leverage this to simplify the API. Pardon my ignorance if I miss something, but given a codec:

const codec = { mimeType: "video/vp8-encrypted", clockRate: 90000, fmtp: "encapsulated-codec=vp8", };

I don't think we can reuse the "fmtp" parameter - that's stuff that has external meaning. So we'd have to go back to my previous proposal of adding a "packetizer" attribute (or some other parameter with other meaning), which was agreed to be dropped in Seville.

But I'm not clear about what semantics you would want to attach to this extra parameter.

jan-ivar commented 9 months ago

The semantics, since this is an encoded transform, is the codec of the encoding being transformed.

If the goal is adding SDP mimeType negotiation, e.g. "video/vp8-encrypted" (an e2ee example), or let's say "video/h264-metadata" (an added-metadata example), it seems the UA can either infer packetization from the name, or rely on some structured sub-property. Those use cases don't seem to require custom packetization, so I'm trying to keep it out of the API for now.

I got that fmtp line from your example. What's its external meaning? I got zero hits from google.

jan-ivar commented 9 months ago

How is this achieved today with SDP munging?

alvestrand commented 8 months ago

No, I haven't tried to do SDP negotiation by SDP munging, so I don't know if it's possible (or desirable, for that matter). Our current approaches to e2ee using Javascript rely on the fact that packetizers are tolerant of data that doesn't conform to the codec specification, so that (eg) encrypting the payload and then packetizing using a VP8 or VP9 packetizer gets something useful at the other end.

If this API was available, it would certainly be feasible to register "vendor" media types with IANA in order to support proper negotiation of custom packetization rules. That would sidestep the issue of using a defined packetization rule for content that is troublesome for that packetization rule, but wouldn't encourage interoperability (unless standardized).

alvestrand commented 8 months ago

Discussion at editors:

We can move the capability method into the transceiver - the transceiver has to exist before creating offer/answer anyway, in order to generate codecs at all. This also makes it possible to move it all the way to the transform, if desired.

This also opens up for linking capability with packetizer - they're at the same level.

fippo commented 8 months ago

We can move the capability method into the transceiver - the transceiver has to exist before creating offer/answer anyway

Developers are already confused by setCodecPreferences being per-transceiver (which also gets created by SRD) so this at least is a consistent mistake.

jan-ivar commented 8 months ago

I'm glad my feedback led to a reduction in methods, but this still adds a codec API on transceiver instead of on the transform as I proposed, so it doesn't yet satisfy my objections to this API in https://github.com/w3c/webrtc-encoded-transform/pull/186#discussion_r1361233157.

We shouldn't repeat the mistake of putting codecs on transceivers, as sender and receiver codec-needs may differ.

E.g. an app might encrypt on sending in one direction but not the other, i.e. have transceiver.sender.transform but no transceiver.receiver.transform, or use a different codec/transform the other direction. We don't want to get locked into negotiating both sending and reception of both.

What codec the O/A needs to negotiate depends directly on the transform added, making it an inherent property of that transform. Separate methods the application needs to keep in sync seem like a mistake.

Rebasing my advice, instead of:

transceiver.addCodec(codec, codec.packetizationMode);
transceiver.sender.transform = new RTCRtpScriptTransform(worker, {});

...I think we should do:

transceiver.sender.transform = new RTCRtpScriptTransform(worker, {codecs: [codec]});

The UA then negotiates what's needed from that, like it does for addTransceiver and createDataChannel.

The convenience method pc.addCodecCapability seems undesirable and redundant, letting apps forget to add a transform.

We can have a sub-discussion about where packetizationMode goes, but I see taking two codec arguments back to back already caused confusion above, so I maintain my earlier advice of declaring the packetizing mode inside the made-up codec.

henbos commented 8 months ago

We shouldn't repeat the mistake of putting codecs on transceivers, as sender and receiver codec-needs may differ.

If the problem is not forcing bi-directionality, then it sounds like there are three options?

Putting it in the transform.
Putting it on the sender and receiver.
Adding a direction argument for the transceiver method.

Codec preferences are already decided on the transceiver and we now have sender.setParameters() with ability to specify codec. So codecs and negotiation are already firmly within the transciever/sender/receiver realm, regardless of where to place the "add codec" method/argument. The transceiver needs to know because of setCodecPreferences and the sender needs to know because of setParameters.

Pros/cons of each approach? @jan-ivar @alvestrand

jan-ivar commented 8 months ago

Codec preferences are decided separately for sender and receiver. The botched setCodecPreferences obfuscates that.

Putting it on the sender and receiver.

Con: invites people to set codec and transform separately, which has no use case and invites complexity for no gain.

Adding a direction argument for the transceiver method.

Con: bad for the same reason we don't put it on pc and pass in transceiver and direction.

The transceiver needs to know because of setCodecPreferences and the sender needs to know because of setParameters.

A transceiver can see properties on its sender and receiver, so I see no technical "need" here.

fippo commented 8 months ago

Codec preferences are decided separately for sender and receiver. The botched setCodecPreferences obfuscates that.

SDP negotiates codecs per m-line and does not support per-codec directionality.

I would agree to a statement saying baking SDP into the API as transceivers botched it though.

jan-ivar commented 8 months ago

SDP negotiates codecs per m-line and does not support per-codec directionality.

Ah, thanks for clarifying! https://github.com/w3c/webrtc-pc/issues/2888 confused me a bit. So end-points may have directional codec limitations, but m-lines cannot express them?

Still, I can use transceiver.sender.setParameters to send e.g. h264 while my transceiver.receiver receives vp8, right?

I would agree to a statement saying baking SDP into the API as transceivers botched it though.

👍

In this case, the SDP result is an effect of applying a transform:

If I apply an RTCRtpScriptTransform then I need to specify what "codec" describes it.
If I apply an SFrameTransform then I don't.

alvestrand commented 8 months ago

Codec preferences are decided separately for sender and receiver. The botched setCodecPreferences obfuscates that.

The fact that you may have misunderstood setCodecPreferences() is no excuse for calling it "botched". setCodecPreferences controls ONLY the receiver's stated preferences, controlling ONLY the list of codecs in the m= line of the media section. Placing it on the transceiver is consistent with having functionality that only controls the SDP negotiation on the transceiver, not the sender or receiver.

Putting it on the sender and receiver.

Con: invites people to set codec and transform separately, which has no use case and invites complexity for no gain.

Adding a direction argument for the transceiver method.

Con: bad for the same reason we don't put it on pc and pass in transceiver and direction.

This is about SDP control, not sender or receiver control. See bullet 3 in "final steps to create an answer" for language where we already confront the same issue. https://w3c.github.io/webrtc-pc/#dfn-final-steps-to-create-an-answer

I have some sympathy for adding a "direction" argument to the PC-level helper, so that the codecs can be suitably handled in the "sendonly/sendrecv" filtering in "final steps to create an answer", but the transceiver-level functionality doesn't need it.

The transceiver needs to know because of setCodecPreferences and the sender needs to know because of setParameters.

A transceiver can see properties on its sender and receiver, so I see no technical "need" here.

Separation of concerns dictates that we keep parameters that are chiefly concerned with the SDP on the transceiver level. That's why "setCodecPreferences" is on transceiver level.

henbos commented 8 months ago

So codecs are necessarily per m= section since directionality is not something that is negotiated in the SDP.

The other endpoint prefers to receive codec X, for which we don't have a transformer? No problem, this is just a preference, the sender can send whatever it wants, whenever it wants, and is not obligated to add a transformer until it decides to use the transformer.
The other endpoint negotiates codec X. We can't know if the app wants to use it for sending or for receiving, so we can't forbid it based on whether or not transforms exist. On the receive side, yes, the transform is needed before receiving, but this is a timing issue rather than a hard requirement for negotiation.

We can decide if we want to add any "hand-holding" logic to prevent foot-gunning (I don't think we necessarily do), but nothing changes the fact that a) codec negotiation is per transceiver and in a direction-agnostic way, and b) it is perfectly valid to create the transformer at a later stage than negotiation.

jan-ivar commented 8 months ago

Thanks for explaining the details.

Separation of concerns dictates that we keep parameters that are chiefly concerned with the SDP on the transceiver level. That's why "setCodecPreferences" is on transceiver level.

I disagree there's any need to separate transceiver from sender/receiver to somehow orient the API towards SDP needs. These are 1-1 objects, so we should organize things logically based on function, not SDP.

"Early 1.0" versions of the spec (and indeed early implementations of the API for many years) worked fine without transceivers. As I recall, we introduced transceivers chiefly two reasons: 1) direction changes, and 2) stop.

setCodecPreferences may also have been a reason, but the reason it is at the transceiver level probably has more to do with the need to put send and receive codecs into a singular order. It was probably just simpler to put this on the transceiver. The direction attribute remains an unfortunate API wort IMHO.

jan-ivar commented 8 months ago

Many things, like addTrack and createDataChannel affect what goes into SDP, so we should not limit ourselves in API design to put anything related to SDP on the transceiver.

alvestrand commented 8 months ago

It's not a design mistake that neither addTrack() nor createDataChannel() are interfaces on Sender and Receiver.

alvestrand commented 8 months ago

setCodecPreferences was put on transceiver exactly because it's an SDP-modifying function with no effect on transmission. If transceiver hadn't existed, it would have been on receiver; it deals only with preferences for receiving.

fippo commented 8 months ago

This discussion of fundamentals is orthogonal to this PR and the API it describes, no? Can we move it elsewhere?

"Early 1.0" versions of the spec (and indeed early implementations of the API for many years) worked fine without transceivers. As I recall, we introduced transceivers chiefly two reasons: 1) direction changes, and 2) stop.

No. Transceivers were introduced because it turned out that with unified plan and the many m-lines it was no longer possible to treat the SDP as an opaque blob. Transceivers are baking SDP semantics into the API (which means we can never get rid of SDP) and do a very good job at providing a model for the SDP (I am serious! I am not happy but they do what they were supposed to do). A side effect is that you need to understand SDP semantics in order to understand the API. The direction attribute including its values is literally taken from https://www.rfc-editor.org/rfc/rfc4566#page-26 stop describes the concept rejecting a m-line from https://www.rfc-editor.org/rfc/rfc3264#section-6

The header extensions API is on the transceiver as well but header extensions can actually negotiate a direction in the SDP (albeit not universally implemented).

Harald:

[setCodecPreferences] deals only with preferences for receiving

Thank you for pointing that out, the difference is subtle but the sample is now fixed to show the right semantics!

henbos commented 8 months ago

Regardless of the history of how we got here, SDP rules are what they are, and transcievers are the control knobs for m= sections. We should be consistent with the SDP rules first and foremost.

pthatcher commented 8 months ago

As the person who proposed RtpTransceiver in the first place (and offered the alternative name "SdpMline"), I've always viewed the RtpTransceiver as the thing that is there just to deal with SDP. If you aren't dealing with SDP, put the API on the RtpSender or RtpReceiver. That way, in some wonderful future where one can construct RtpSender and RtpReceiver objects without SDP, you don't need RtpTransceivers to exist.

dontcallmedom-bot commented 7 months ago

This issue was discussed in WebRTC November 2023 meeting – 21 November 2023 (SDP Issue 186: New API for SDP negotiation 🎞︎)

jan-ivar commented 6 months ago

Many things, like addTrack and createDataChannel affect what goes into SDP, so we should not limit ourselves in API design to put anything related to SDP on the transceiver.

It's not a design mistake that neither addTrack() nor createDataChannel() are interfaces on Sender and Receiver.

My point is pc.addTrack() creates a sender, which can be assigned a transform or not. Only if it has a transform is there a need to specify input and output codecs. All else being equal, this suggests putting it on transform. If the user agent has what it needs to negotiate this then there's no need to further impact the API shape, is there?

As a mental experiment, imagine addTransceiver having an option to specify transform at time of creation of the sender:

const transceiver = pc.addTransceiver("video", {transform});

This matches how sendEncodings can be specified at time of creation of the sender, which requires negotiation.

Negotiation predates transceiver, so transceiver is not special, nor the single housing point for APIs that affect SDP.

alvestrand commented 6 months ago

actually const transceiver = pc.addTransceiver("video", {transform: transform, sendTrack: track}) looks like a nice interface to me.

I don't understand the meaning or relevance of "predates" - negotiation is SDP, which was RFCed in 1999. When we found that our sender/receiver model (which dated something like 2012) was growing warts because it interfaced to the SDP negotiation model, we invented Transceiver to limit the wartiness.

jan-ivar commented 6 months ago

actually const transceiver = pc.addTransceiver("video", {transform: transform, sendTrack: track}) looks like a nice interface to me.

Great! But isn't that just

const transceiver = pc.addTransceiver(track, {transform: transform});

?

Would adding this help? It would probably need to be sendTransform and receiveTransform then.

A foo = new Foo({bar}) initializer for foo.bar = bar seems redundant except for readonly attributes. E.g. there likely wouldn't be any semantic difference between:

const transceiver = pc.addTransceiver(track, {sendTransform, receiveTransform});

...and:

const transceiver = pc.addTransceiver(track);
transceiver.sender.transform = sendTransform;
transceiver.receiver.transform = receiveTransform;

alvestrand commented 6 months ago

My point was really that if we want to add more attributes to addTransceiver(), we should not add more arguments. Actually addTransceiver() already takes an RTCRtpTransceiverInit with 3 existing members - adding a transform here would be consistent with the existing interface.

jan-ivar commented 6 months ago

Actually addTransceiver() already takes an RTCRtpTransceiverInit with 3 existing members - adding a transform here would be consistent with the existing interface.

I've filed https://github.com/w3c/webrtc-encoded-transform/issues/221. My point here was merely to highlight that a transceiver is primarily the sender and receiver.

jan-ivar commented 6 months ago

... That way, in some wonderful future where one can construct RtpSender and RtpReceiver objects without SDP, you don't need RtpTransceivers to exist.

I agree, so we should stop adding new methods to RtpTransceiver if we can avoid it.

I favor SDP-agnostic options on new RTCRtpScriptTransform over the more SDP-specific transceiver.addCodec.

Is it SDP-specific for JS to declare the input/output codec of the transform it performs? — Seems useful to the worker regardless of SDP. E.g. based on the transform-based part of what @alvestrand and I have been working on:

    transceiver.sender.transform = new RTCRtpScriptTransform(worker, {
      inputCodecs: [{mimeType: "video/vp8"}],
      outputCodecs: [{mimeType: "video/custom-encrypted"}]
    });
    transceiver.receiver.transform = new RTCRtpScriptTransform(worker, {
      inputCodecs: [{mimeType: "video/custom-encrypted"}],
      outputCodecs: [{mimeType: "video/vp8"}]
    });

This nicely communicates the task to the (reused) worker (removing the need to invent a side property in my fiddle).

alvestrand commented 6 months ago

After writing up some considerations for how codecs are described in https://github.com/w3c/webrtc-pc/issues/2925 I am starting to feel that the transform proposal fits better - but it would be described as modifying the codec list when a transform is added.

Comments?

dontcallmedom-bot commented 4 months ago

This issue was mentioned in WEBRTCWG-2024-02-20 (Page 14)

alvestrand commented 4 months ago

@jan-ivar this is incomplete, but does it so far conform with what you think we've agreed on?

alvestrand commented 3 months ago

I think this is ready for an editors review now - I believe the spec changes are in a state where they are good to merge, and reflect the API used in the explainer.

Please take a look.

dontcallmedom-bot commented 2 months ago

This issue had an associated resolution in WebRTC April 23 2024 meeting – 23 April 2024 (Custom Codecs):

RESOLUTION: Consensus on on #186, discussion to continue on #202

w3c / webrtc-encoded-transform

Add description of an API for controlling SDP codec negotiation #186