w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
978 stars 136 forks source link

[WebCodecs VideoFrame metadata registry] Introduce VideoFrame metadata #559

Closed youennf closed 1 year ago

youennf commented 2 years ago

Preview | Diff

youennf commented 2 years ago

@sandersdan FYI.

sandersdan commented 2 years ago

We just discussed this among the Chrome media team, and it turns out we're okay leaving out the user metadata. This would allow us to sidestep most of the questions about serialization.

The reasoning here is that if we're not storing metadata also on chunks, and we're not copying metadata between chunks and frames, then users will have to map their own metadata to frames separately anyway (presumably by timestamp). Therefore the user metadata is just a convenience rather than a critical feature in the short term.

youennf commented 2 years ago

it turns out we're okay leaving out the user metadata.

FWIW, one use case I can see is the AR/VR use case where metadata is retrieved by a worker through WebRTC encoded transform, then metadata is passing through decoder to MediaStreamTrack VideoFrames. In that case there might be two contexts so it seems nice to attach application metadata to VideoFrames directly.

That said, I am more than happy to leave this to another day/another PR.

youennf commented 2 years ago

I removed user metadata from this PR. I kept dictionary/method for now. There is a slight benefit for interface/attribute as we would return the same object every time, which might be handy if the metadata becomes a potentially big object.

youennf commented 2 years ago

One potential downsides of using a VideoFrameMetadata interface is that we would probably anyway need a VideoFrameMetadataInit dictionary. Anytime a spec would extend VideoFrameMetadata with a new readonly attribute, it would also have to extend VideoFrameMetadataInit.

youennf commented 2 years ago

As a side note, VideoFrameMetadata name (even though dictionary names are not that important) conflicts with https://wicg.github.io/video-rvfc/#dictdef-videoframemetadata. @tguilbert-google FYI.

tguilbert-google commented 2 years ago

I'd be fine with renaming the rVFC VideoFrameMetadata to something else, to avoid collision. I think that VideoFrameMetadata makes more sense here.

Albeit verbose, rVFC VideoFrameMetadata could be renamed to VideoFrameCallbackMetadata?

youennf commented 2 years ago

rVFC VideoFrameMetadata could be renamed to VideoFrameCallbackMetadata?

Sounds good, dictionary names are not exposed to JS so verbosity is not a big issue anyway.

dalecurtis commented 1 year ago

Thanks, lgtm. @aboba @padenot any objections?

youennf commented 1 year ago

What's the first user here?

First user might be https://github.com/w3c/mediacapture-extensions/pull/48 rvfc exposed metadata might also be good candidate as well (say you are exposing VideoFrames coming from a peer connection track).

Monkey patching can be annoying.

With partial dictionaries, I do not expect a lot of monkey patching.

dalecurtis commented 1 year ago

Editor's call:

padenot commented 1 year ago

This is an example of adding a registry document to the repo: https://github.com/w3c/webcodecs/pull/319, but I can also do it.

dalecurtis commented 1 year ago

Since it'd be an entire new registry document: https://github.com/w3c/webcodecs/commit/a270f0f3adafcea640cc8b2834eadce4f671a54e is the commit where we added the codec registry for reference.

youennf commented 1 year ago

I think there is some magic needed somewhere (@tidoust maybe?) to get a proper echidna name. Hopefully, this would also make the [[webcodecs-video-frame-metadata-registry]] link work in this PR as well (and WebCodecs adding the metadata registry to its document reference list (normative since it defines VideoFrameMetadata?).

youennf commented 1 year ago

I'd like to see a registration policy closer to "MEDIA WG consensus required".

Maybe it is not clear, but I would expect this to be the current policy. The current wording tells that the PR is merged based on MEDIA WG consensus. The PR to the registry would update the VideoFrameMetadata WebIDL definition to include the metadata entry(ies). And the PR would include a link to the document describing each entry in more details. If more than that is needed, would you be able to provide some suggestions on how to update this PR?

By the way, this raises the question whether this document is a normative reference or not to the main spec. I would tend to think so given it defines the VideoFrameMetadata WebIDL.

Also, it seems important to me that the PR pass the IPR check.

We can add such check, I was thinking this was implied by the fact this document is a Media WG document. Maybe this is not the case given the document is a registry?

tidoust commented 1 year ago

Also, it seems important to me that the PR pass the IPR check.

We can add such check, I was thinking this was implied by the fact this document is a Media WG document. Maybe this is not the case given the document is a registry?

As described in the W3C Process Document, registry documents are like notes and are not subject to the W3C Patent Policy:

"A registry report or registry section is purely documentational, is not subject to the W3C Patent Policy, and must not contain any requirements on implementations. For the purposes of the Patent Policy [PATENT-POLICY], any registry section in a Recommendation track document is not a normative portion of that specification."

youennf commented 1 year ago

I see, I am not sure we should go with a pure registry document like codecs then. My understanding was that VideoFrameMetadata WebIDL would be normative.

tidoust commented 1 year ago

I see, I am not sure we should go with a pure registry document like codecs then. My understanding was that VideoFrameMetadata WebIDL would be normative.

Perhaps that what you need is the same approach as for the codec registry: the registry itself is a table that maps an identifier to some spec that normatively defines the IDL.

Codec registrations are published on the Note track mostly because they touch on codecs and it wasn't clear that we would get appropriate IPR commitments. Ideally, they would rather be published as normative documents on the Recommendation track. If the situation is simpler for VideoFrameMetadata, the working group can publish the specs that define the WebIDL on the Recommendation track.

dalecurtis commented 1 year ago

@youennf How come you want metadata to be normative?

My expectation is that UAs can pick and choose from the metadata they want to support. I think requiring media WG consensus for VFM entries strikes a good balance between allowing everything and only allowing use cases where all UAs agree on the underlying feature (E.g., face detection).

Like codecs I can foresee some metadata having patent implications (e.g., specific HDR metadata types), so I think the metadata registry must be non-normative.

aboba commented 1 year ago

Agree with @dalecurtis that the metadata registry should be non-normative. However, if the registry is "specification required", it can provide a table that includes both the IDL for the metadata as well as a link to the specification that normatively defines the IDL.

This is how IANA tables work; the tables themselves are not normative, it is the specifications that the tables link to that provide the normative language.

Some questions:

  1. Will the specification always have been produced in the W3C? For example, could the ITU-T, IETF or AoMedia make a request for a registry entry based on one of their specifications?
  2. Could the MEDIA WG delegate authority for a top-level entry to another entity? For example, could authority for a dictionary of AR/VR metadata be delegated to another W3C WG? Or does the MEDIA WG have to approve each registry addition?
youennf commented 1 year ago

@youennf How come you want metadata to be normative?

There is some normative language that needs to exist. For instance, if a UA exposes a field X within the metadata dictionary, then the field value must have a value of the type given by the registry/metadata document.

My expectation is that UAs can pick and choose from the metadata they want to support.

Each metadata entry is an optional API somehow. But if UA implements it, it needs to follow the registry and registry metadata documents, so they become normative. By design, VideoFrameMetada will be designed this way as all its members will be optional.

if the registry is "specification required", it can provide a table that includes both the IDL for the metadata as well as a link to the specification that normatively defines the IDL.

That works for me.

  1. Will the specification always have been produced in the W3C? For example, could the ITU-T, IETF or AoMedia make a request for a registry entry based on one of their specifications?

It makes sense that the registry can link (directly or indirectly) to other documents. The normal way should be that the registry links to a Media WG metadata document which links to ITU-T, IETF, AOM documents.

2. Could the MEDIA WG delegate authority for a top-level entry to another entity?

This makes sense to me as well. The registry will link to a document that may define two members, one of which being a simple type and another one being a dictionary whose definition is owned by another WG.

youennf commented 1 year ago

@dalecurtis , @aboba, maybe the registry type is the right way. Somehow, the codec registry is normative in the sense that if 'vp8' is used, then the UA has to follow the VP8 spec.

dalecurtis commented 1 year ago

I've always assumed implementing whatever thing is listed in a registry was optional, but once a UA decides to implement, the text within the registry entry is normative. Is that not true? @tidoust @chrisn

tidoust commented 1 year ago

I've always assumed implementing whatever thing is listed in a registry was optional, but once a UA decides to implement, the text within the registry entry is normative. Is that not true? @tidoust @chrisn

I would define normative statements in W3C specs as conformance requirements on implementations. Such requirements are subject to the W3C Patent Policy. As raised in a previous comment, the W3C Process is explicit that a registry "is not subject to the W3C Patent Policy and MUST NOT contain any requirements on implementations".

Registries are more like lookup tables to redirect people to the right spec. As mentioned by @aboba, IANA protocol registries come to mind as typical examples: most IANA registries are tables where each entry maps an identifier to a name and/or description and to a normative RFC that defines requirements on implementations. Registry entries themselves do not contain normative content.

No document has been published under the Registry track so far (the track was added last year) so it's hard to reflect on experience within W3C. Having Web IDL content definitions directly in a registry document seems wrong to me since Web IDL content is normative in essence. I would rather expect registry entries to link to one or more specs that define the Web IDL and that are published on the Recommendation track.

youennf commented 1 year ago

WebRTC-stats might be a good example we could follow:

  1. WebRTC-stats defines WebIDL APIs
  2. WebRTC stats does not define implementation conformance. See in particular https://w3c.github.io/webrtc-stats/#conformance: This specification does not define what objects a conforming implementation should generate.

By default, metadata would be optional to expose/support. If, in the future, we want to document that some metadata are mandatory to support, the requirements could be added either in this metadata document or in other documents (WebCodecs main spec or others). For instance, webrtc-pc (or media capture-transform maybe) could make it mandatory to expose any WebRTC specific metadata related to remote tracks.

dalecurtis commented 1 year ago

I see. Thanks @tidoust -- maybe a registry of spec links is the best we can do with the registry model then. The WebRTC stats model also seems reasonable, but I have no experience with it so I don't know if it's well received. I'll defer to @aboba and @padenot on that.

I don't think we'd ever add mandatory metadata to VideoFrameMetadata. Mandatory fields should probably be on the VideoFrame object itself (e.g., timestamp, duration, etc). @sandersdan and I were just discussing this for something like orientation.

chrisn commented 1 year ago

Having mandatory fields on VideoFrame itself makes sense to me, not everything needs to be done via registry entries. The registry is helpful when implementation is optional. In this case, I'd suggest adding language to the registry to prevent naming collisions between mandatory fields and fields defined via the registry.

Summarising where I think we've got to:

aboba commented 1 year ago

If we think about the registry entries as just a member of a table that points to a specification, then it is the specifications that define how registry entries evolve.

For example, a specification that created registry entries might be superceded by a new specification which could include new registry entries, could clarify the meaning of existing entries, or might not use one or more registry entries defined in the former specification.

In the IETF model, RFCs never disappear once published, so if the new specification does not mention a registry entry created in a previous specification, the entry continues to link to the old specification, which might be deprecated by the new specification. In this case, the registry entries are not deleted or deprecated; it is the specification that the entries link to that changes status.

Within the IETF model, a new specification can update the allocation policy of a registry. If a registry entry is used in existing implementations, typically new specifications will recognize that usage and will not reallocate the entry for another use so as to avoid interoperability problems.

This suggests that if a metadata spec is obsoleted, the successor specification should dictate what happens to the registry entries. But what if there is no successor specification? I'd be concerned about situations in which the link that registry entries pointed to disappears, or is updated to point to a new specification that no longer defines the registry entry.

It seems like a successor specification should be able to clarify the meaning of a registry entry, or deprecate it. However, it there are some things to consider if the new specification makes the old one become inaccessible. If this leaves a registry entry pointing to a specification that can no longer be retrieved, this could be confusing. Is the W3C publication process archival, allowing obsoleted specifications to be retrieved, potentially going back into the distant past? If after a specification update a registry entry would be with a broken link, deletion might make sense.

youennf commented 1 year ago

I don't think we'd ever add mandatory metadata to VideoFrameMetadata.

There are two meaning in mandatory metadata:

  1. Mandatory metadata in the sense of required dictionary member. I agree we do not want that.
  2. Mandatory metadata in the sense it MUST be supported. This would happen if a UA supports the faceDetection MediaStreamTrack constraint. In any case, the actual face detection metadata field remains optional.

Anyway, can somebody summarise what are the actual editorial steps required to merge this PR? We can always make improvements on the actual framework. Landing this PR would allow to make progress on some of the metadata in parallel to those improvements.

chrisn commented 1 year ago

Thanks @youennf, this looks good. We can discuss procedural requirements in a new issue if you prefer, and merge this one. Once we have those defined, we can run a CfC in the Media WG to actually publish this registry.

youennf commented 1 year ago

We can discuss procedural requirements in a new issue if you prefer, and merge this one.

Right, let's do that in a follow-up. @aboba, can you review the latest PR?

Once we have those defined, we can run a CfC in the Media WG to actually publish this registry.

Should we have a registry entry to run the CfC or is it fine to keep it empty?

chrisn commented 1 year ago

Should we have a registry entry to run the CfC or is it fine to keep it empty?

I think it's fine to be empty.

I'll raise another PR for the remaining procedural bits. We can probably also drop this item from the Media WG meeting agenda for tomorrow.

chrisn commented 1 year ago

@padenot, @aboba, is this OK to merge, and we can raise a separate PR for any remaining procedural requirements?

padenot commented 1 year ago

Yes.

youennf commented 1 year ago

@aboba, can you review again the PR and approve it if you think this is fine?

dalecurtis commented 1 year ago

Still lgtm; thanks @youennf -- though there is some PR build error:

FATAL ERROR: Couldn't find 'webcodecs-video-frame-metadata-registry' in bibliography data.

Bump on this one before submitting though.

youennf commented 1 year ago

@tidoust, can you shed some light on the issue @dalecurtis is mentioning above?

tidoust commented 1 year ago

@tidoust, can you shed some light on the issue @dalecurtis is mentioning above?

I pushed an update to add the entry to the local biblio. The entry can be removed when the registry document comes to existence and gets added to official biblio databases.

dalecurtis commented 1 year ago

Great, looks like we're all good here. Thanks @youennf!