w3c / webrtc-encoded-transform

WebRTC Encoded Transform
https://w3c.github.io/webrtc-encoded-transform/
Other
124 stars 27 forks source link

How to handle transforms largely changing frame size #50

Open youennf opened 3 years ago

youennf commented 3 years ago

Transforms may be able to introduce large changes in frame size (decrease or increase). It seems interesting to understand how to handle these cases.

I can see different variations of these cases:

  1. Metadata size is not really negotiable by the JS transform, size change is more or less fixed. The user agent can handle it. User Agent can detect the size of the metadata and update the encoder bitrate according to the average of the transformed data.

  2. Transform may decided to decrease metadata size The transform may add more or less metadata based on available bandwidth. The transform could be made aware of the target bit rate from network side, maybe the encoder target bit rate as well. The transform would then compute how much space it can add to the frame.

  3. Transform might want to trade media quality to include more metadata In that case, the transform can decide whether to reduce the encoder bit rate or the size of the metadata, or both. It seems useful to notify the JS whenever change of the encoder bit rate is planned and potentially allow the JS to override the default behavior.

Case 1 requires no new API. Case 2 could be implemented as getter APIs. Case 3 can be implemented in various ways: ReadableStream/WritableStream pair, transform, events, maybe even through frame dedicated fields). It seems all these variants should provide roughly the same functionality support.

I feel like a single object that the JS could use for all of this when processing a frame might be the more convenient from a web developer perspective.

youennf commented 3 years ago

One good example there might be RED (https://tools.ietf.org/html/rfc2198). IIUIC, it is integrated as a codec similarly to Opus, the transform being applied on the result of the RED encoder.

RED encoder is using an underlying encoder (say Opus) to compress data. It will then generate content by taking this compressed data and by appending previously encoded data to it until a byte limit is reached. Ideally, the content will be serialised as one packet.

If the transform increases the size of the content by too much, things might actually break.

fippo commented 3 years ago

Correct, the audio RED encoder is wrapping the opus one, stores a packet (or two, three, four, ...) and then essentially concatenates them.

Since audio typically isn't split into multiple packets that could create a problem with insertable streams but I think we have a lot of margin here. The audio encoder does know the transport overhead which is calculated from RTP header size or RTP header extensions currently (see here for details)

alvestrand commented 3 years ago

Generic problem is transforms where the input is very different from the output. "Support RED as a transform" may be a separate topic - this would require that the transform has available the desired packet size from the packetizer. Also gets into the whole issue of how an app that inserts a transform can also do the correct SDP actions (preferably without having to munge everything in SDP).

alvestrand commented 3 years ago

If we switch to using EncodedVideoChunk, which is immutable, the problem becomes a different problem.

fippo commented 3 years ago

I actually ran into this again with a audio/RED encoder. Since audio packets don't get packetized they are size-limited (1200 bytes - transport overhead). This ended up being dropped silently deep down in libwebrtc. Should 1/ enqueue throw for some definition of "too large audioframe" 2/ the original frame specify a maximum size?

youennf commented 1 year ago

@alvestrand, @aboba, I was initially thinking we could add an API for the web page to expose the target bitrate overhead of the transform. Rethinking about this issue though, in case transforms add a lot of overhead, I wonder now whether we need to provide any API or whether this could be an unexpected consequence of transforms that implementations need to deal with.

1/ enqueue throw for some definition of "too large audioframe"

If write is failing, it usually means the WritableStream will be errored, which we do not want. I guess an error event in RTCRtpScriptTransformer is a possibility if we want the web app to get that info fast, or a stat otherwise.

2/ the original frame specify a maximum size?

That is one possibility, another possibility is to put it at RTCRtpScriptTransformer level:

partial interface RTCRtpScriptTransformer {
    readonly attribute unsigned maxAudioFrameSize;
}

with possibly an event in case the value changes. One question though is how to compute this value (MTU - estimated RTP packetization overhead?)

fippo commented 1 year ago

One question though is how to compute this value (MTU - estimated RTP packetization overhead?)

The question is what the MTU is even (and we have no PMTU). For video a common assumption in libWebRTC is 1200 but as https://bugs.chromium.org/p/chromium/issues/detail?id=1470254& has shown, stereo opus at 510kbps can result in an RTP payload of 1275 bytes. Which was lower than the (arbitrary) limit of 1000. It is unclear if what works on all networks even without adding more bytes.

youennf commented 1 year ago

Right, given it might be hard for UA to give a good value and the web app could probe this info on its own, it does not seem worth exposing such a value.

alvestrand commented 1 year ago

The comment about audio frame size limitation came up in a different context today. I think the proper venue for that is AVTCORE in the IETF - if we want to add a packetizer to an audio codec (and I think there are cases where we need it), I think that's the place it should be defined.

Moving the arbitrary limit from 1000 to 1200 is probably pretty safe.