Open youennf opened 1 year ago
1 is probably the easiest approach compared to 2 and is more natural than 3 and 4.
The current situation with generic metadata in WebCodecs VideoFrame
is that there is support, but no adequate technical solution proposed. I'm interested in any proposal that:
VideoFrame
are carried by the underlying video resource.new VideoFrame(existingVideoFrame, {updatedMetadata: ..., visibleRect: ...})
).
new VideoFrame(oldFrame, {metadata: {...oldFrame.metadata, foo: 123})
.Absent such a proposal, we are still recommending (3) or (4), passing the metadata out-of-band.
I don't think there is strong support for handling face metadata specially, but doing so would be the shortest path to in-band metadata.
- Can be serialized to bytes. (I assume this excludes Symbol)
Agreed we need support to clone/postMessage metadata. I was thinking we could use structure cloning (https://html.spec.whatwg.org/multipage/structured-data.html#safe-passing-of-structured-data), which is what is being used when postMessaging a value, say to workers.
For instance, we could add steps in the constructor to structure clone the metadata input parameter and the result would be stored in a VideoFrame object slot. The metadata accessor should either provide a copy of the metadata or the metadata object itself (maybe we should freeze it?).
- Supports namespacing in some form.
Good point. I am fine either going with UA defined metadata initially or adding support for web app specific metadata. In any case, both kind of data should probably follow the same principles (data being structure clonable say).
In terms of spec editing, web codec could define a WebCodecMetadata dictionary, either without any member or containing something like a any userDefinedMetata
member.
WebRTC spec would then define a partial WebCodecMetadata dictionary listing the face detection dictionary members.
- The basic rule of 'drop everything' may be good enough
+1
@sandersdan , how does this look to you? Is it precise enough to think about writing a PR?
I was thinking we could use structure cloning
Structured clone by itself doesn't work because it assumes there can be side data (such as ports) in addition to the raw bytes. The for storage
variant might work, but I'm not familiar enough to say for sure.
It might actually make sense to just drop down to JSON here. I don't think metadata should need to be self-referential, for example.
In terms of spec editing, web codec could define a WebCodecMetadata dictionary, either without any member or containing something like a any userDefinedMetata member.
Yes, this is about the best I was able to come up with as well, and I think it meets the requirements. I like that unlike a partial for VideoFrame, a partial for VideoFrameMetadata would be straightforward to splat.
{metadata: { user: { ... } } }
is a bit cumbersome, but the only alternative I have is { metadata: ..., userMetadata: ... }
which just trades for complexity instead. One surprise could be that { metadata: { myMetadata: 123 } }
would simply be dropped by the IDL binding, but good documentation can overcome that.
Is it precise enough to think about writing a PR?
I think the serialization part needs work before becoming a PR, but it could be at least proposed in the existing bug.
Edit: The existing bug is https://github.com/w3c/webcodecs/issues/189. There is a separate bug for EncodedChunk
metadata, https://github.com/w3c/webcodecs/issues/245, but that also adds the complexity of possibly having to copy metadata from frames to chunks or the reverse.
It might actually make sense to just drop down to JSON here
I could see metadata be an array buffer, in which case JSON is not great.
I think the serialization part needs work before becoming a PR, but it could be at least proposed in the existing bug.
I think https://html.spec.whatwg.org/multipage/structured-data.html#structuredserialize is what we want. This is roughly what structuredClone is using under the hood (we do not want any transfer parameters since we want to ensure we can clone frames). forStorage=false is good here.
that also adds the complexity of possibly having to copy metadata from frames to chunks or the reverse.
I do not think we need to expose this to web pages, at least initially. It should be reasonably simple for the web app to set metadata from a VideoFrame to its corresponding chunk. This might be something we might want in WebRTC (metadata from track to encoded transform) but WebRTC spec could handle this metadata passthrough on its own.
Following on https://github.com/w3c/mediacapture-extensions/pull/69 and media capture transform, face detection metadata could be made available to mediastreamtrack transforms. There are a few possibilities we could envision. The following come to mind: