Buffered ranges should be defined as time ranges that can be played without requiring download of more data

ghost commented 8 years ago

https://html.spec.whatwg.org/multipage/embedded-content.html#loading-the-media-resource

The buffered attribute must return a new static normalised TimeRanges object that represents the ranges of the media resource, if any, that the user agent has buffered, at the time the attribute is evaluated. Users agents must accurately determine the ranges available, even for media streams where this can only be determined by tedious inspection.

The buffered ranges should be defined as the region of the Media Resource which is downloaded and which is playable without needing to download any more data.

For example, since the useragent may discard already downloaded media resource data, the UA could end up discarding the keyframe required to decode a block of downloaded data. In this case, I'd argue that we should no longer report the remaining p-frames that depend on the discarded keyframe as buffered in the buffered ranges, since we can't effectively play them without downloading more data.

The spec doesn't actually define whether the buffered ranges have to be playable. If we decide not to require buffered ranges to be playable (which I think is a bad idea), then we should explicitly add that to the spec to so that there is no ambiguity.

foolip commented 8 years ago

I'm pretty sure that @Hixie's intention with the current wording is precisely what you are asking for, the "Users agents must accurately determine the ranges available, even for media streams where this can only be determined by tedious inspection" means that one should not simply map byte ranges to time ranges based on average bitrate, but if necessary to actually look at every byte being downloaded.

However, I don't know if anyone has actually implemented it like this, it would be very expensive. The best that I ever did in Presto was to use the keyframe index (in WebM) to make the most conservative estimate I could of the playable ranges. IIRC, that made the buffered ranges appear to jump at keyframe intervals...

@cpearce-mozilla, how does Gecko currently implement the buffered attribute?

@dalecurtis, last I checked Chromium's WebMediaPlayerImpl does a fairly basic estimate of the buffered ranges because it doesn't actually know precisely what is in the cache or will stay in the cache. Is that still the case?

@eric-carlson, how are buffered ranges determined on Safari?

jyavenard commented 8 years ago

I can answer for gecko: For MP4 it actually does exactly like cpearce described above. The buffered attribute accurately represent what's in the cache based in the location of key frames and only includes frames actually decodable.

For WebM, it represents the time range of the complete clusters found in the media cache. As such, it's a bit conservative in that it doesn't include the partial clusters that may be contained at the end of each byte range.

For MP3, it is based on the average bitrate of the MP3 content that has been downloaded and parsed.

For MSE, the buffered range is also exact, based on the number of frames parsed.

We are currently experiencing a big overhead in calculating the buffered range for MP4 and what lead to this ticket is that I was suggesting that ignoring the key frames would be good enough. However, this issue will no longer be a problem very soon

foolip commented 8 years ago

@jyavenard, it what you're doing for MP4 and WebM is based only on keyframe tables and other information from the header, or are you actually inspecting the data as it comes in over the network, to check that the timestamps are what you expected?

If it's based on information in the header, then it falls short of "Users agents must accurately determine the ranges available, even for media streams where this can only be determined by tedious inspection."

This is fine, of course, to actually go as far as the spec suggests doesn't seem sensible to me. The only way to really know if the data is playable is to try to decode it all, and that's just not an option.

How about Ogg without Skeleton headers, are you guessing based on bitrate there?

jyavenard commented 8 years ago

"If it's based on information in the header, then it falls short of "Users agents must accurately determine the ranges available, even for media streams where this can only be determined by tedious inspection."

Not sure I follow. For MP4, the timestamp information is found in the header (moov box) so of course we rely on it. We parse every single samples found in the samples table and check if it's fully contained in the media cache. If a keyframe or a previous frame is missing, all of them are skipped until the next keyframe. As you wrote, short of decoding them as well, there's not more we can do to determine an accurate buffered range. It's the same with MSE

ghost commented 8 years ago

How about Ogg without Skeleton headers, are you guessing based on bitrate there?

For Ogg, we demux the pages at the start and at the end of every buffered range and figure out the timestamp of the first and last first and last Ogg packet for all active streams that can be decoded in those ranges.

I wrote that code, and it definitely felt like "tedious inspection". ;)

ghost commented 8 years ago

I think we should tweak my requested change to be:

"The buffered ranges should be defined as the region of the Media Resource which is downloaded and which is playable without needing to download any more data from the Media Resource."

That is, things that affect playability that are out-of-band, namely EME keys, should not affect the buffered ranges. A fully buffered encrypted media resource that just lacks usable keys should be reported as fully buffered.

foolip commented 8 years ago

@jyavenard I'm not familiar with MP4, but I was thinking about the kind of situation described in https://github.com/whatwg/html/issues/360#issuecomment-159860979 where using the keyframe index in the header only allows you to make a pessimistic estimate, because "tedious inspection" would be needed to figure out where the audio samples and non-keyframe video samples are, and thus precisely how much you could play.

In any case, I don't think that anyone will ever do that tedious inspection when a keyframe index or similar is available.

foolip commented 8 years ago

@cpearce-mozilla, the current definition is "The buffered attribute must return a new static normalised TimeRanges object that represents the ranges of the media resource, if any, that the user agent has buffered, at the time the attribute is evaluated."

Do you have a proposed new wording? This definition doesn't talk about playability at all... It could perhaps be expressed in terms of the ability to seek to a position?

dalecurtis commented 8 years ago

@dalecurtis, last I checked Chromium's WebMediaPlayerImpl does a fairly basic estimate of the buffered ranges because it doesn't actually know precisely what is in the cache or will stay in the cache. Is that still the case?

Correct. WMPI uses a mix of demuxed packet timestamps plus extrapolation based on what's been downloaded into the media cache thus far. We don't know if data we've evicted from the media cache is present in the disk cache.

https://code.google.com/p/chromium/codesearch#chromium/src/media/blink/webmediaplayer_impl.cc&l=539

foolip commented 8 years ago

Ping @cpearce-mozilla about question in https://github.com/whatwg/html/issues/360#issuecomment-160587104

foolip commented 8 years ago

@padenot, perchance, have you discussed this with @cpearce-mozilla?

cpearce commented 8 years ago

I haven't forgotten this, have just been busy. Will try to get to this soon.

foolip commented 8 years ago

Cool! Which GitHub account would you like us to use? If you like I can make you a collaborator so that I can assign issues (like this) to you.

foolip commented 8 years ago

I've invited both @cpearce and @cpearce-mozilla to the whatwg org, please accept the one (if any) you'd like to use and reject the other. (Note that https://help.github.com/articles/merging-multiple-user-accounts/ says "We recommend using only one user account to manage both personal and professional repositories.")

cpearce commented 8 years ago

Which GitHub account would you like us to use?

@cpearce. I've given up on trying to maintain a separate professional github account. It's not working!

whatwg / html

Buffered ranges should be defined as time ranges that can be played without requiring download of more data #360