w3c / webcodecs

WebCodecs is a flexible web API for encoding and decoding audio and video.
https://w3c.github.io/webcodecs/
Other
949 stars 134 forks source link

Emit metadata (SPS,VUI,SEI,...) during decoding #198

Open chcunningham opened 3 years ago

chcunningham commented 3 years ago

Splitting off an idea from @ytakio in https://github.com/w3c/webcodecs/issues/94#issuecomment-828054996

I think WebCodecs needs a method to get Metadata (including feature check method, as informative in spec?)

Can you clarify what's meant by "including feature check method, as informative in spec"? Do you mean you'd like to query for what types of metadata could potentially be emitted?

It may be very useful even if WebCodecs just pass a start point of each Metadata byestream (like SPS,VUI,SEI...) to WebAPPs.

Can you elaborate on how it would be useful to apps?

ytakio commented 3 years ago

My explanation is too bad... I'm so sorry for my poor English 😅

Can you clarify what's meant by "including feature check method, as informative in spec"? Do you mean you'd like to query for what types of metadata could potentially be emitted?

I wanted to say that WebApp may want to know whether WebCodecs have a capability of detecting and parsing each metadata and pass them, before decoding. (parsing means that generating a metadata Object from bitstream, in here)

And that capability of parse may not be mandatory. (I wanted to say it may be "Informative" in W3C's spec, though)

But I think it is hard to have feature of parsing metadata for every codec stream. So...

It may be very useful even if WebCodecs just pass a start point of each Metadata byestream (like SPS,VUI,SEI...) to WebAPPs.

Can you elaborate on how it would be useful to apps?

Some codec bitstream have some metadata in like the following.

Almost of above are for post processing. I think WebApp may want to handle by themselves. And almost metadata block of each codec are designed easy to be parsed. But it's a little bit tough to seek boundary of each block for JavaScript App, I think.

So, I think it may be useful even if it just passes byte array of metadata block when WebCodecs finds metadata block in bitstream. (I think it may figure a register-type notifier with label such a "sps", "sei" like (.on("sei")) :thinking:)

I'd appreciate it if you would confirm.

chcunningham commented 3 years ago

My explanation is too bad... I'm so sorry for my poor English 😅

Your English is good. I wish I spoke Japanese!

It may be very useful even if WebCodecs just pass a start point of each Metadata byestream (like SPS,VUI,SEI...) to WebAPPs.

Thanks, I follow the proposal now.

Would it generally make sense for such metadata to accompany a frame in the output callback? With semantics being: this metadata describes the current and subsequent frames)?

chcunningham commented 3 years ago

Triage note: marking 'extension', as the proposal would likely be implemented with additional callbacks or arguments.

ytakio commented 3 years ago

Thank you for encouraging me ;)

Would it generally make sense for such metadata to accompany a frame in the output callback?

It seems enough to work, I think. A frame may have multiple metadata blocks (SPS contained VUI includes an aspect ratio information; SEI NALs). It may be good to have an Array of metadata bytes (BufferSource?) in VideoFrame.

On the other hand, if VideoDecoderConfig.description include SPS NAL, WebApp can parse it by themselves. (e.g. In case of initialization segment received)

With semantics being: this metadata describes the current and subsequent frames)?

Your recognitions are no problem, I think :) (In detail, a few metadata will be assigned to specified sequence ID as whole presentation, but WebApp may be in charge of controlling it)

cvanwinkle commented 2 years ago

Regarding reading SEI information, one use case for SEI information is to read pre-existing CEA-608/CEA-708 closed captions. A tool could then modify them, retime them, and then re-export or convert to TTML or something. I had previously worked on a (desktop) tool that needed to read raw closed captions from media files which had to do this for other file formats. In that scenario, emitting the data as part of the frame callback could work, but if there's a way to just retrieve the SEI information without having to do the work of decoding the actual frames that would be even better. The reason for that is the video frames may only be requested on-demand (i.e. starting playback mid-way through on a video file in a video editing application) but the entirety of the SEI information may be good to know up-front for the captions scenario above or perhaps others (but not a deal breaker). In other words, be able to scan the entire file for SEI data without also doing the work to decode each frame.

darkvertex commented 2 years ago

This would have been useful for me. I have a use case where I worked around the fact WebRTC didn't let me do frame-accurate synced A+V+data by embedding small JSON metadata into SEI subtype 5 (aka "unregistered user data".)

If the WebCodecs API had a callback or some way to consume the SEIs, I could have made a webapp that debugged it instead of an inconvenient separate standalone software.

leonardoFu commented 2 years ago

I have several use cases related to this feature, we use SEI information to render effects on video, and also calculate the end to end delay from the broadcaster to web client. I am pushing an proposal which allows web app to get SEI information from video element, if we use webcodecs, we can have the SEI in a frame level accuracy, which is really helpful in video edit scenario

sandersdan commented 1 year ago

The ability to attach metadata to a frame has recently been discussed in https://github.com/w3c/webcodecs/issues/189, and it seems likely that progress will be made there. This would solve one of the blockers here, a convenient way to expose the metadata.

What we are missing is primarily certainty that all future WebCodecs applications are able and willing to extract this metadata from the bytestream. Since it's possible to implement the extraction in JS, there needs to be a compelling reason to have the WebCodecs implementation do it.

Would it help significantly to support passing through user-provided metadata from EncodedVideoChunk to VideoFrame? I anticipate Chrome would do so purely by matching timestamps, so it may not be any more powerful than what JS can do.