w3c / mediacapture-transform

MediaStreamTrack Insertable Media Processing using Streams
https://w3c.github.io/mediacapture-transform/
Other
44 stars 19 forks source link

Expectations/requirements for VideoFrame and AudioData timestamps #80

Open chcunningham opened 2 years ago

chcunningham commented 2 years ago

Is it valid to append mutliple VideoFrames or AudioData objects with the same timestamp (e.g. timestamp = 0) to a MediaStreamTrack? If so, what is the behavior? Does the spec describe this?

aboba commented 2 years ago

With spatial scalability, you can have multiple encodedChunks with the same timestamp (e.g. base layer as well as spatial enhancement layers). Does this result in the decoder producing multiple VideoFrames with the same timestamp? Or does the decoder wait until encodedChunk.timestamp advances before providing a single VideoFrame combining all the layers provided?

Currently, we do not configure the operating point in the WebCodecs decoder, so that decoder doesn't know the desired operating point and the layers that the operating point depends on. So at any given timestamp, the decoder could be provided with just a base layer encodedChunk per timestamp or maybe the base layer plus perhaps some spatial enhancement layer frames. It can only know what it has to work with once the timestamp of encodedChunks advances (which adds delay) or if it is configured with the operating point (in which case it can start decoding once it has been provided with all the layers that the operating point depends on).

chcunningham commented 2 years ago

With spatial scalability, you can have multiple encodedChunks with the same timestamp (e.g. base layer as well as spatial enhancement layers). Does this result in the decoder producing multiple VideoFrames with the same timestamp? Or does the decoder wait until encodedChunk.timestamp advances before providing a single VideoFrame combining all the layers provided?

In this case the decoder would produce multiple VideoFrame's with the same timestamp, but authors would be expected to discard many of these, passing only their desired resolution to MSTG.