w3c / media-source

Media Source Extensions
https://w3c.github.io/media-source/
Other
268 stars 57 forks source link

Potential duration update interop issue in coded frame processing #203

Open wolenetz opened 6 years ago

wolenetz commented 6 years ago

Imagine a very strange media segment containing a sequence of frames as follows <dts,pts,duration> (both are keyframes, previously the duration was 0):

<0, 1000, 10> <30, 10, 10>

If the coded frame processing algorithm is run with these two frames while in 'segments' AppendMode, then the last step of the algorithm:

If the media segment contains data beyond the current duration, then run the duration change algorithm with new duration set to the maximum of the current duration and the group end timestamp.

would set the duration to 20, not 1010. This is because the group end timestamp is reset to 10 following the dts discontinuity, and then increased to 20 based on the duration of that frame following the discontinuity.

While contrived a bit, this highlights an opportunity to clarify that line of the spec to more precisely indicate the normative behavior that is intended.

For example, something like either: A) "Let highest end time be the largest track buffer ranges end time across all the track buffers for this SourceBuffer. If highest end time is larger than the current duration, then run the duration change algorithm with new duration set to highest end time"

or

B) accomplish the same as A by remembering the highest frame end timestamp encountered in this coded frame processing algorithm's execution and if that is larger than current duration, then run the duration change algorithm with that highest frame end timestamp.

Both of these options align with the logic in the duration change algorithm that prevents reducing duration below the highest end time of any buffered frame across all track buffers across all SourceBuffers in the MediaSource.

wolenetz commented 6 years ago

@jdsmith3000 @mwatson2 The interop part of this issues comes down to when (under what conditions might an implementation send pieces of a media segment to the coded frame processing algorithm) an implementation decides to run the coded frame processing algorithm; I would like your input as to whether my proposal, above, constitutes a substantive change or just a clarification (I believe it's the latter).

jyavenard commented 6 years ago

First, I'd like to point that there's no codec (either video or audio) for which such content would be valid. You can't have out order keyframes like so, nor would the duration be set at such value. It would be an invalid file as per the standard describing the content.

I'm aware of some badly muxed content (like the file mp4/h264 used in the WPT) which incorrectly tags frames as keyframe, in particular the B-frame are tagged as keyframes, even when the h264 bytestream NAL clearly make them non keyframe. Gecko ignores the keyframe information from the MP4 container for precisely that reason, instead analysing the bytestream to detect keyframes.

So really, it's a matter of garbage in - garbage out, and I'm not sure the spec should cater for such broken content. (I personally have a general dislike for fixes that just paper over an actual problem)

I would much prefer that such content be rejected, it is invalid to start with.

If it comes to choosing to change anyway, then I much prefer proposal A

jdsmith3000 commented 6 years ago

It's hard to imagine a real world scenario where such content would be encountered, and failing to play it does not seem in appropriate.

wolenetz commented 4 years ago

I'll try to get some real-world usage data from Chrome (since currently it does not outright reject such cases). - edited to remove non-public crbug reference.

wolenetz commented 4 years ago

With the code I just landed in Chrome (https://chromium-review.googlesource.com/c/chromium/src/+/1810330), I'll hopefully have enough information in November 2019 to understand if this scenario is actually occurring at a significant level.