w3c / mediacapture-transform

MediaStreamTrack Insertable Media Processing using Streams
https://w3c.github.io/mediacapture-transform/
Other
44 stars 20 forks source link

Expectations/requirements for VideoFrame and AudioData timestamps #80

Open chcunningham opened 2 years ago

chcunningham commented 2 years ago

Is it valid to append mutliple VideoFrames or AudioData objects with the same timestamp (e.g. timestamp = 0) to a MediaStreamTrack? If so, what is the behavior? Does the spec describe this?

aboba commented 2 years ago

The mediacapture-transform specification does not currently describe how timestamp is processed.

Related: #96

Potential future issue when spatial scalability is supported:

With spatial scalability, you can have multiple encodedChunks with the same timestamp (e.g. base layer as well as spatial enhancement layers). Does this result in the decoder producing multiple VideoFrames with the same timestamp? Or does the decoder wait until encodedChunk.timestamp advances before providing a single VideoFrame combining all the layers provided?

Currently, we do not configure the operating point in the WebCodecs decoder, so that decoder doesn't know the desired operating point and the layers that the operating point depends on. So at any given timestamp, the decoder could be provided with just a base layer encodedChunk per timestamp or maybe the base layer plus perhaps some spatial enhancement layer frames. It can only know what it has to work with once the timestamp of encodedChunks advances (which adds delay) or if it is configured with the operating point (in which case it can start decoding once it has been provided with all the layers that the operating point depends on).

chcunningham commented 2 years ago

With spatial scalability, you can have multiple encodedChunks with the same timestamp (e.g. base layer as well as spatial enhancement layers). Does this result in the decoder producing multiple VideoFrames with the same timestamp? Or does the decoder wait until encodedChunk.timestamp advances before providing a single VideoFrame combining all the layers provided?

In this case the decoder would produce multiple VideoFrame's with the same timestamp, but authors would be expected to discard many of these, passing only their desired resolution to MSTG.

dontcallmedom-bot commented 2 months ago

This issue was discussed in Joint Media/WebRTC WG meeting at TPAC – 26 September 2024 (Expectations/Requirements for VideoFrame and AudioData timestamps)

dontcallmedom-bot commented 6 days ago

This issue had an associated resolution in WebRTC November 19 2024 meeting – 19 November 2024 (Issue #80: Expectations/Requirements for VideoFrame and AudioData timestamps):

RESOLUTION: Add to mediacapture-main extensiblity consideration to make sure sink define their behavior on frame timestamps and file issues on sink specs accordingly