w3c / webrtc-encoded-transform

WebRTC Encoded Transform
https://w3c.github.io/webrtc-encoded-transform/
Other
124 stars 27 forks source link

Need for modifying metadata #135

Open alvestrand opened 2 years ago

alvestrand commented 2 years ago

In some applications there is the need to modify the metadata, but the spec only allows a GetMetadata operation. This is inconvenient if the transform operation means that the metadata really should change.

A PutMetadata function would make it simple.

nisse-work commented 2 years ago

Another attribute that would be very nice to have is some kind of id that can be matched to unencoded frames processed upstream using the insertable streams api. It seems one reasonable candidate for an id (if we don't want to add a new one) is VideoFrame.timestamp (https://developer.mozilla.org/en-US/docs/Web/API/VideoFrame). It would be nice if this value was propagated through the encoder and attached to the RTCEncodedVideoFrame or RTCEncodedVideoFrameMetadata.

Example use: Updating csrc list (part of RTCEncodedVideoFrameMetadata) to match any mixing/compositing done when processing this particular frame upstream to the encoding.

nisse-work commented 2 years ago

Hmm, maybe I was confused; I thought we only had RTP timestamp in RTCEncodedVideoFrame, but it also has this timestamp field: https://w3c.github.io/webrtc-encoded-transform/#dom-rtcencodedvideoframe-timestamp

It's poorly documented, but if it's specified to equal the VideoFrame.timestamp of the corresponding unencoded frame, it's should be good enough, I think.

fippo commented 2 years ago

have you seen https://chromium-review.googlesource.com/c/chromium/src/+/3654171 which clarified things a bit? (it is a mess...)

nisse-work commented 2 years ago

Confusing indeed. Seems this was discussed previously on https://github.com/w3c/webrtc-encoded-transform/pull/116.

Exposing capture time should be reasonably easy on the send side (prototype cl to do that for video: https://webrtc-review.googlesource.com/c/src/+/265403).

I don't really understand the issues in getting that kind of timestamp on the receive side. If it can't easily be set to useful value on receive side, making it optional would be an improvement for use cases I have in mind.

fippo commented 2 years ago

we can't change the definition of timestamp now :-( :+1: for exposing an optional absoluteCaptureTimestamp (present if the abs-capture-time header extension is used?) on RTCEncodedVideoFrameMetadata though (it looks like some code surrounding your CL already does something similar?)

nisse-work commented 2 years ago

So what if we add two new attributes to RTCEncodedVideoFrameMetadata:

Does that make sense? We could then consider deprecating the current timestamp attribute which isn't in the "metadata".

fippo commented 2 years ago

Less clear on the receive side, could perhaps be conditionally populated depending on presence of absolute capture time extension

I'd go for captureTime -- similar to how it works in requestVideoFrameCallback - @drkron can explain the details. Downside of this is that it is best effort and requires SR or rtrr for clock offset estimation so it won't be available on every frame.

nisse-work commented 2 years ago

I created a pull request, see https://github.com/w3c/webrtc-encoded-transform/pull/137. It seems I may have to create some w3c account, to get it on track for the process?

dontcallmedom commented 2 years ago

indeed @nisse-work although in practice the high-level bit will be your employment situation and its impact on IPR. Please get in touch at dom@w3.org if you want more details.

fippo commented 2 years ago

or talk to @alvestrand ;-)

nisse-work commented 2 years ago

Back to setMetadata. I can make a pull request to add that, but I need some guidance about how it should work.

  1. Newbie question: The type dictionary RTCEncodedVideoFrameMetadata listen in the spec, is that a plain javascript dictionary to the application, so that, e.g., it could be constructed as {"width" : 100} ? It seems clear that this should be the argument type for setMetadata, for consistency with the getMetadata method.
  2. Return value: The method needs a way to indicate success or failure (might fail, e.g, if application tries to set unknown keys, or invalid values such as width < 0). Should this simply be a bool (true for success), or something more complex? Or should it be an asynchronous operation with promise/callback?
  3. Should caller be required to pass a dictionary with all defined attributes present (typical pattern would then be getMetadata, modify one or two attributes, setMetadata)? Or is it fine to pass a partially populated dictionary, and then leave all attributes not mentioned unchanged?
  4. Most of the attributes seem rather inappropriate for the application to change. E.g., changing the spatial index or dependency list will likely break decoding at the remote end. Setting csrcs (the use case I have in mind) is the only one that looks reasonably safe. Maybe it's better to define a specific method to change cscrs only? I'm thinking of the case where a webrtc encoder is used; in case the application has it's own encoder (implemented in js or webasm), then the situation is different and the application has to know, and populate, all the attributes of the encoded frame.
alvestrand commented 2 years ago
  1. Yes, WebIDL dictionaries are constructible from JS dictionaries. The construction process ignores unknown keys, so unknown keys "can't happen".
  2. My suspicion is that setMetadata could be synchronous, but I'm mosty in the "when in doubt, return a promise" camp. I'd like to hear from @guidou whether he could think of a reason for making it async. Failure in JS is mostly signaled by throwing an exception or rejecting a promise, so the function should return "undefined".
  3. I'd suggest that the setMetadata fully replaces the current metadata. So get/modify/set would be the normal pattern. Sending in a dictionary with only the changes makes it hard to describe that you want to delete an attribute.
  4. If we move a frame between PeerConnections, there may be good reasons for changing the dependency descriptors - for instance, if we need to renumber frames, then DD indexes will need to change too. But if the result doesn't make sense, SetMetadata should throw an error.
guidou commented 2 years ago

Implementing setMetadata requires going deep into the webrtc stack and deal with a number of objects living in different threads. Might be doable synchronously (as getMetadata is), but it would be prudent to try to sketch an implementation before committing to a sync interface.

youennf commented 2 years ago

To sketch this API, we need to identify many use cases, especially if we go with setMetadata. Going with a dedicated csrc/ssrc mutator might be easier to sell if the specific use case is convincing.

As of sync vs. async, if we have to hop to threads, we usually go async. But I think we should first look at the actual model. It seems to me the natural model is that the metadata is only used after the frame is enqueued: at the time of enqueueing the frame, we clone the metadata and this gets used down the pipe. This seems to call for a sync API. The question of bad metadata is something to take into account. We should also learn from WebCodec decoders and metadata approach.

nisse-work commented 2 years ago

Implementing setMetadata requires going deep into the webrtc stack and deal with a number of objects living in different threads. Might be doable synchronously (as getMetadata is), but it would be prudent to try to sketch an implementation before committing to a sync interface.

I had a quick look at the webrtc implementation of getMetadata, and it appears to return a reference to internal state, and it appeared to make the assumption that it's immutable (and then I suppose chromium makes a copy when packaging it for js access). Making it mutable and properly synchronized looked a bit tricky at first look.

So maybe we shouldn't require synchronous operation.

nisse-work commented 2 years ago
  1. I'd suggest that the setMetadata fully replaces the current metadata. So get/modify/set would be the normal pattern. Sending in a dictionary with only the changes makes it hard to describe that you want to delete an attribute.

The current attributes look non-optional to me, so that they can be changed but not deleted. But I guess it's not unlikely that optional attributes will be added later, so good to take that use case into account.

alvestrand commented 2 years ago

Synchronous = SetMetadata() returns immediately, having affected the metadata Asynchronous = SetMetadata() (probably) posts a task that performs the modification, and returns a promise that will resolve when the modification is done. I don't think exposing the attributes as individually mutable makes sense, and having one mutator for each piece of modifiable metadata seems an invitation to interface cruft.

I also imagine that the only time SetMetadata makes sense is after a frame has been created or received, and before enqueueing it to its destination; modifying it at any other time would be a positive invitation to a race condition.

webcodecs (https://w3c.github.io/webcodecs/#encodedvideochunk-interface) seems to have ignored frame-attached metadata. Instead, it chose to emit EncodedVideoChunkMetadata on its callback interface, alongside the video frame itself: https://w3c.github.io/webcodecs/#encoded-video-chunk-metadata

One of the pieces of their metadata is the "VideoDecoderConfig".

nisse-work commented 2 years ago

Regarding sync or async: I think it is fine to have success/failure indication be asynchronous. But it would be nice is setMetadata followed by enqueuing of the frame guarantees that the modification takes effect before the frame goes out on the wire, without the javascript having to explicitly wait for setMetadata completion.

alvestrand commented 2 years ago

Would be nice if we could regard an encodedFrame as a pure data object, which would make it easy to say that operations that modify it are synchronoous.

fippo commented 2 years ago

I came up with another use-case. In the context of a RED encoder I'd like to modify the payload type. Currently it requires me to play a shell game with SDP and switching between opus and red payload types but iff I could modify the payload type that would not be necessary.

tonyherre commented 2 years ago

I agree that a sync setMetadata call here makes most sense, as one is only mutating local state which is later passed down into the peerconnection and handed off across threads, as Youenn described above. I put together a proof of concept implementation of such an sync API in Chrome & WebRTC, just to make sure we're not missing anything and it seems to work fine. Given this, are there any remaining concerns around making the method synchronous?

The RED example does seem like a good concrete instance of a class of usecases where the encoded transform drastically changes the payload such that the existing metadata no longer matches it. I've not been able to find any good way for apps to handle this without such a setMetadata() method.

alvestrand commented 1 year ago

In the context of the use case of "take an incoming frame and send it over another PC", we've found the need for setting:

jan-ivar commented 7 months ago

I worry this direction will interfere with https://github.com/w3c/webrtc-encoded-transform/issues/141.

EncodedVideoChunk and EncodedAudioChunk are immutable by design, simplifying serialization and avoids TOCTOU issues.

Instead of making metadata of an existing RTCEncodedVideoFrame mutable, EncodedVideoChunk has a constructor that makes it easy and efficient to copy/transfer-and-modify. Perhaps we could follow a similar pattern here? E.g.

const newFrame = new RTCEncodedVideoFrame({
  type: frame.type,
  data: frame.data,
  metadata: frame.getMetadata(),
  transfer: [frame.data]
});
guidou commented 7 months ago

I worry this direction will interfere with #141.

EncodedVideoChunk and EncodedAudioChunk are immutable by design, simplifying serialization and avoids TOCTOU issues.

Instead of making metadata of an existing RTCEncodedVideoFrame mutable, EncodedVideoChunk has a constructor that makes it easy and efficient to copy/transfer-and-modify. Perhaps we could follow a similar pattern here? E.g.

const newFrame = new RTCEncodedVideoFrame({
  type: frame.type,
  data: frame.data,
  metadata: frame.getMetadata(),
  transfer: [frame.data]
});

We agreed on the constructor approach in the December meeting and sent a PR. Can you take a look? https://github.com/w3c/webrtc-encoded-transform/pull/223