w3c / media-source

Media Source Extensions
https://w3c.github.io/media-source/
Other
268 stars 57 forks source link

Clarify that overlapped video coded frames need their duration adjusted in the track buffer #166

Open wolenetz opened 7 years ago

wolenetz commented 7 years ago

Imagine this scenario: SourceBuffer.appendBuffer({video initialization segment}); SourceBuffer.appendBuffer({video coded frame keyframe DTS=PTS=0, duration = 30ms}); SourceBuffer.appendBuffer({video coded frame keyframe DTS=PTS=20, duration = 30ms});

video track buffer per v1 spec would contain: frames [0,30)[20,50). Note the overlap.

A subsequent SourceBuffer.remove(20,30) would remove the coded frame [20,50), resulting in track buffer containing: [0,30).

This may be confusing to web authors and implementors, in that an overlapped video frame, the duration isn't adjusted in the track buffer, and rendering that frame might or might not observe the duration. Further, if it doesn't observe that duration, rather it respects the overlapping video frame's PTS, then it could be confusing when a removal of the overlapping frame makes the originally overlapped frame's presentation interval "spring back" to 30ms in this example.

I suggest that we clarify in the spec the behavior really should be to adjust the duration of the overlapped frame (similar to that in audio splice frame). Text might be:

Current:

Suggested new text:

Chrome currently does the "otherwise" case for certain byte streams (webm) when those byte streams don't explicitly set a precise coded frame duration (e.g., a SimpleBlock frame at the end of a cluster), so the duration is initially estimated. Such estimated durations are "fixed up" on overlapped appends to them, but that leaves the non-estimated-duration webm blocks, as well as any non-webm coded frames to suffer from having overlapping presentation intervals in the same track buffer in the scenario, above.

This clarification might best be done at this point in VNext, given the V1 spec is nearing PR/REC.

jyavenard commented 7 years ago

I think this is opening a can of worms in attempting to fix what is otherwise improperly muxed content. And I don't believe this is something the spec should attempt to cater for.

While what you describe may be more relevant for Chrome's internal implementation; frames are typically stored in decode order.

That means the 2nd frame you are adding may be overlapping a frame that is not immediately prior that new frame, or not even adjacent to it. Like: [20,40)[0,30)[40,60) or: [20, 40)[40, 60)[0, 30)

Having to modify the duration of a frame because its presentation end time overlap an earlier ones is no trivial matter. Nor is there I think a unique solution to it. Which one do you truncate in the example above? the [20, 40) one (making it [30, 40) or the later one [0, 30) (which becomes [0, 20)

Ultimately, a video frame duration has little meaning. Of all containers, only mp4 has a concept of frame duration (IMHO, a video frame shouldn't have a duration at all; a video frame should be displayed until there's a new one to replace it) That's certainly the case with webm: the situation you describe can't occur with this container. You can't have overlapping frames there. Only mp4. (the only time you could have overlapping frames in webm is as you describe, the duration of the last SimpleBlock was incorrectly estimated; but for those there's a solution, simply mark the last block with a BlockDuration). All issues started when the concept of frame duration was added to MSE.

So ultimately, I think we should leave the current text as-is.

wolenetz commented 7 years ago

While a can of worms, it needs opening or at least clarification. The MSE spec very clearly processes coded frames as they are appended in decode order. It also very clearly prescribes what to do if the presentation time of an overlapped frame falls within the presentation interval of the overlapping frame (remove the overlapped frame completely, including frames which depend on it). Assuming 100% keyframes and PTS==DTS in your examples, the results (including duration adjustment in bold) would be:

[20,40)[0,30)[40,60)

[0,30)[40,60) // Note that [20,40)'s PTS is within the presentation interval of the overlapping [0,30), so the entire [20,40) frame is dropped already by the V1 MSE coded frame processing algorithm.

[20, 40)[40, 60)[0, 30) [0,30)[40,60) // Ditto: [0,30) triggered dropping [20,40)

Which one do you truncate in the example above?

Therefore, neither.

You can't have overlapping frames there. Only mp4. (the only time you could have overlapping frames in webm is as you describe, the duration of the last SimpleBlock was incorrectly estimated; but for those there's a solution, simply mark the last block with a BlockDuration).

webm has frame duration signalled in BlockGroup->BlockDuration, or if SimpleBlocks, inter-block delta if within the cluster. Only end-of-cluster (for each track) SimpleBlock needs any duration estimation.

All issues started when the concept of frame duration was added to MSE.

Frame duration was necessary IIUC to be able to detect, interoperably, discontinuity in the coded frame processing algorithm. It is also fundamental to the buffered range calculations. We include PTS+frame duration when understanding the presentation interval of a coded frame, and its corresponding buffered footprint on the media timeline.

attempting to fix what is otherwise improperly muxed content.

This issue is really attempting to understand what to do to reflect interop when there is an end-overlap of a previously buffered coded frame presentation interval. This is not necessarily due to improperly muxed content. It could simply be content from distinct muxed media sources appended on-purpose such that the beginning of one end-overlaps the end of the other. If the overlapped coded frame's duration is not adjusted, then media duration might behave a little weird:

Append [0,30) --> buffered == [0,30). Append [20,25) --> buffered==[0,25). ** endOfStream() --> duration == 25 (highest track buffer range end time is now 25). remove(20,25) (re-opens the mediaSource). duration still == 25. --> buffered==[0,30) if overlapped frame duration wasn't previously adjusted But, duration is still 25. This violates the intent of the duration change algorithm, which lower-bounds duration by the highest track buffer range end timestamp.

\ Do you suspect a different behavior here than [0,25)?

jyavenard commented 7 years ago

Assuming 100% keyframes and PTS==DTS in your examples, the results (including duration adjustment in bold) would be:

That can't be.. The pts are out of order as such, can't have PTS == DTS here. dts would be: 0, 20, 40 respectively.

[20,40)[0,30)[40,60)

[0,30)[40,60) // Note that [20,40)'s PTS is within the presentation interval of the overlapping [0,30), so the entire [20,40) frame is dropped already by the V1 MSE coded frame processing algorithm.

Why would the frame be dropped?

when [0, 30) is added, we have last decode timestamp set; as such the condition to run step 13 of the coded frame processing algorithm https://w3c.github.io/media-source/index.html#sourcebuffer-coded-frame-processing "If last decode timestamp for track buffer is unset and presentation timestamp falls within the presentation interval of a coded frame in track buffer" won't be true. And no frame will be dropped.

wolenetz commented 3 years ago

I'll need to invest further thought to see if this is valid. If so, might still be candidate for V2. For now, putting into bugfix milestone.