Seekable differs from non-MSE behavior

dmlap commented 9 years ago

The MSE spec seems to indicate that the highest end time for seekable should never exceed the highest buffered time when the duration is set to Infinity. The HTML standard indicates that duration should be Infinity for unbounded or live media and that user agents should be very liberal determining seekable ranges for media.

Safari on iOS and OSX seems to have interpreted this as meaning that the seekable range should include the time ranges covered by all segments in the current "sliding window" of content in a live HLS video. That definition is convenient because it makes seeking to the live point or building a DVR interface a simple operation for downstream developers, and seems in keeping with the spirit of the HTML standard. It does not seem possible to configure Source Buffers or a Media Source to achieve the same effect. Is there a mechanism to override seekable with out-of-band info like you might get from an M3U8?

dmlap commented 9 years ago

As a consequence of the current seekable definition and step 8 of the video seeking algorithm, I believe out-of-buffer seeking is not possible for live streams. MSE specifies the seekable ranges to be the union of the current buffered regions and the video spec constrains seeking to time positions within the seekable range.

To make things a bit more concrete, imagine you have a sliding-window style live stream manifest with five video segments currently available. After a bit of buffering, your player has downloaded one segment. In Safari and iOS devices, you would be allowed to set the current time to any position that is greater than three segment durations from the latest available position. With Media Source Extensions, you would only be able to seek within the segment you currently have buffered:

Manifest:
     0            1          2           3           4
| -------- |  -------- |  -------- |  -------- |  -------- |

Buffered:
     0
| -------- |

MSE Seekable:
     0
| -------- |

"Native" HTML5 Seekable:
     0            1          2
| -------- |  -------- |  -------- |

jdsmith3000 commented 9 years ago

The issue in this bug seems to involve consistency of behavior, but not interoperability, assuming I understand it correctly. The impact seems to limit how close to real-time a user can seek on live streams. Moving beyond the limits of buffered data should temporarily stall playback until buffering is restored, while limiting the seek to within the buffered range would minimize the risk of that.

dmlap commented 9 years ago

Yes, the issue is about consistency with the HTML spec's guidelines for live playback behavior.

I may be missing something in your point about real-time playback. It seems to me that the stream can be joined arbitrarily close to real-time by adjusting where to begin appending data into a Source Buffer. What you can't do (I think) is instruct the video element to seek back two minutes if you only have one currently appended in your Source Buffer, unlike media with a known duration or non-MSE live implementations.

heff commented 9 years ago

It sounds like we need a mechanism similar to mediaSource.duration, but for live content where duration must be Infinity.

With non-live content the app can set mediaSource.duration, which informs the MediaSource what video data (outside of buffered) is available to the app. The MediaSource assumes the start time is zero and that we can seek anywhere up to duration, despite having none of it buffered.

With live content the MediaSource can't look at duration because it must be Infinity, and can't assume the start time is zero. So it currently only relies on buffered, which is a very limited view point. If the UI were to respect seekable (blocking seeks outside of the buffered range) but display the actual content length that's available to the app, it would be a frustrating experience for the viewer. Expanding seekable to represent the full length of the content available to the app would be no less safe than the non-live use case, there just isn't a way to do that.

So it seems like it would helpful if the MediaSource provided more control over defining the start time and end time of the available media timeline. What if it were direct control over what seekable returns?

mediaSource.seekableStart = ...
mediaSource.seekableEnd = ...

Changes to those properties could reuse the durationchange event.

dmlap commented 9 years ago

@heff that's a great summary of the problem, thanks. Something like your proposal does seem like it would address the issue. One minor suggestion: Time Ranges support non-contiguous regions. I don't know of a streaming format that makes use of that feature but it could be added without complicating the API too much since Time Ranges are normalized:

mediaSource.addSeekableRange(240, 360);
mediaSource.removeSeekableRange(260, 280);

wolenetz commented 9 years ago

If I understand correctly, there are two live-stream seeking problems related to MSE described so far in this issue: 1) MSE assumes live (infinite duration) streams start at time 0. It assumes the same for non-live streams, too. 2) MSE disallows seeking beyond the end of currently buffered media for live (infinite duration) streams.

Both of these could lead to non-ideal default UI controls and behavior, especially when combined. My understanding is that both of these could be alleviated by apps overriding default UI controls to adapt to the live streaming case (though not ideal, since a standardized expectation of better behavior consistency is the request in this issue).

@dmlap I am confused, though, by "What you can't do (I think) is instruct the video element to seek back two minutes if you only have one currently appended in your Source Buffer, unlike media with a known duration or non-MSE live implementations." In MSE, seekable (when duration equals positive Infinity, and there is a non-empty HTMLMediaElement.buffered TimeRanges), is a range [0, highest_end_time_of_buffered]. So you, could indeed seek back two minutes if you only have one currently appended, so long as there is sufficient room in the timeline prior to the currently appended media. Note that the currently appended media need not have a buffered start at time 0. Am I misunderstanding something?

dmlap commented 9 years ago

@wolenetz I'd agree with your formulation of the issues. I didn't realize the start point of seekable was pegged to zero for live media. In that case, I believe you're correct and my example was a bad one (and clarifies @jdsmith3000's original comment for me, thanks!). The issue would be restricted to seeking forward in live streams.

One follow-up question on the start point of seekable: pegging the start point to zero seems like it could be a problem for long-running live streams. The SourceBuffer is eventually going to run out of space and the live window will march on. How should applications handle seeks to timeline positions that were evicted from the buffer and are no longer available to be downloaded?

wolenetz commented 9 years ago

The issue of SourceBuffer running out of space is orhogonal, IMHO, and handled separately by the user agent's implementation of the coded frame eviction algorithm (http://www.w3.org/TR/media-source/#sourcebuffer-coded-frame-eviction). In appendBuffer() usage, this algorithm is run prior to parsing newly appended media into the timeline (and throws a QuotaExceededErr if not enough space was freed). In appendStream() usage, the same behavior occurs synchronously during the call of appendStream(); the eviction algorithm also runs during each iteration of the stream append loop, though no QuotaExceededErr is thrown from there, but there is an error transition by which an app can discover the problem. Regardless, if the user-agent's choice of removal ranges within its coded frame eviction algorithm was insufficient to free enough space, the app is given notification and can explicitly Remove() buffered ranges.

The problem really is two-fold: 1) How can an app or user seek forward beyond end of currently buffered in an MSE live-stream (that has Infinite duration)?

2) There's some interop issues already between Chrome's strict compliance and other's less compliance around disallowing unrestricted doubles in TimeRanges. This causes different results across various user agents in MSE when the HTMLMediaElement has reached HAVE_METADATA, but otherwise has nothing buffered. In Chrome MSE, an app couldn't seek before anything is buffered; in others, the seekable range (and allowance for seeking) is less strict (because they allow unrestricted double of +Infinity in seekable range: see also https://code.google.com/p/chromium/issues/detail?id=461733#c23).

I believe an API like MediaSource.setSeekableRange([start,end]) could satisfy both problems (in #2, an app could achieve more interoperable behavior by explicitly setting a non-finite seekable range that overrides the default logic in existing MSE spec.

jdsmith@, is this new API suggestion acceptable? I think it solves at least one real problem users of MSE are having, which is "how can apps reliably control the MSE seekable range in live (infinite duration) streams?"

dmlap commented 9 years ago

@jdsmith3000 have you been able to give this issue some thought? I'd be happy to attempt a PR against the spec with @wolenetz's suggestion if that would help move things along.

jdsmith3000 commented 9 years ago

Would this API then alter the response from mediaSource.seekable? And would the effects persist as streaming continued?

@Wolenetz: Given the double limitation you site on the seekable TimeRange, what does Chrome return for a live stream with duration = infinity? To me, it makes sense to return the full range, and have the app then limit the forward seeking to current real time.

dmlap commented 9 years ago

For consistency with the video element's seekable attribute, I think the API should override the response of mediaSource.seekable and be persistent. The current behavior of mediaSource.seekable for live streams doesn't actually provide any information that isn't available to the application developer through other mechanisms in MSE and responding with different values than the media element itself would confuse me, at least.

jdsmith3000 commented 9 years ago

I've had some discussions here, but haven't closed. We aren't convinced that having the app set a seekable range is the right solution. It seems instead that we might want a concept where apps can jump to the live edge. Some formats might also have problems with zero based timestamps (e.g. MPEG-2 TS timestamps roll over every 26 ½ hours, so it’s not as simple as taking the timestamp for a segment and mapping it into a simple zero based timestamp).

dmlap commented 9 years ago

Could you elaborate on what sort of concept you had in mind to allow an app to jump to the live edge? The only ideas I can come up with seem to require info from the app about where the live edge is, which devolves into a seekable setter of some flavor.

wolenetz commented 9 years ago

+interop label to follow-up on the TimeRanges discrepancy in Edge vs others (also, need to confirm if unrestricted double TimeRanges are being done in Edge)

foolip commented 9 years ago

To me, it makes sense to return the full range, and have the app then limit the forward seeking to current real time.

@jdsmith3000, assuming that by this you mean that end times in TimeRanges should be allowed to be Infinity, this doesn't sound great from the PoV of HTMLMediaElement, to me: https://code.google.com/p/chromium/issues/detail?id=461733#c27

The per-spec model for seeking media elements is that the requested time is clamped to seekable ranges, and then normally that's precisely where you'll end up. If at all possible, I think MSE should behave the same way.

Simply allowing the seekable ranges to be set sounds pretty good to me. If you'd like a constructor for TimeRanges so that you can add a settable SourceBuffer.seekable attribute that might work. A setSeekable() that takes and array or array-of-arrays would also do the trick.

jdsmith3000 commented 9 years ago

I'm aware of live streaming players built on the current API that accomplish seekable ranges by implementing custom controls and their own time ranges. These span from the live edge back through a DVR window maintained by the server. The current API isn't specifically aware of either limit, and knows only that duration is +infinity.

Allowing apps to set a seekable range for this case could approximate both limits and enable live seeking using either the standard controls or custom ones that still use the seekable range from the API. Presumably the seekable range would be set at least with each append of new sourceBuffer data. That append contains the most current live data, and the app can extrapolate the DVR limit using window data from the manifest or some other source.

We previously talked about these times persisting once set. That makes seekeable limits in this case the responsibility of the app, and that's probably fine. We would likely want to somehow limit apps from using this on VoD content, where valid time ranges exist. This might be done by allowing seekable ranges to be set only on duration +infinity content, or where the API itelf cannot supply valid time ranges.

Assuming we can agree on this limitation, I'd support going ahead with implementing something like this approach:

mediaSource.addSeekableRange(240, 360);

I'm less clear on the proposed

jdsmith3000 commented 9 years ago

Completing my previous statement: I'm less clear about the proposed

jdsmith3000 commented 9 years ago

mediaSource.removeSeekableRange(260, 280);

For seekable ranges with gaps in them, and would like to discuss this at the TPAC session.

wolenetz commented 9 years ago

From TPAC 2014 MSE f2f today: set single seekable range had consensus among attendees as very useful (and V1). Set / maintain multiple seekable ranges was deemed a separate V.Next feature request.

Moving to needs implementation and taking this per discussion at TPAC. Implementation proposal will likely need to consider: 1) disallow setting seekableRange when updating is true (app should first let append/remove updating finish, or call abort to make it (append) finish). 2) disallow setting seekableRange that would clip any buffered media (this is meant to prevent weird edge cases where there's buffered media in the timeline that can't be seeked to) 3) perhaps allow setting empty seekable range (if #1 is satisfied, but not necessarily #2), to allow apps to prevent any seeking. (If the loop flag is set, would this prevent looping back to time 0, or is that an artifact just in Chrome?)

wolenetz commented 9 years ago

regarding "perhaps allow setting empty seekable range": consensus was this would be ok, so long as it doesn't break things (like maybe in HTMLME algorithms depending on non-empty seekable maybe if playback has begun, for instance)

foolip commented 9 years ago

Set / maintain multiple seekable ranges was deemed a separate V.Next feature request.

That seems odd. Assuming that both forms would just replace the existing seekable ranges, the difference in complexity seems very small. Was there some concern with multiple ranges?

wolenetz commented 9 years ago

@foolip : Good point! Multiple seekable ranges was less requested at f2f, though if the complexity for supporting it is similar/less in the spec and implementations relative to HTMLMediaElement (as you suggest and my preliminary investigation confirms), then we could include this in MSE V.1 more trivially. The intent of my previous comment was to help prevent scope creep, and it seems that restricting to maximum 1 range is actually more complex/greater scope.

wolenetz commented 9 years ago

New https://www.w3.org/Bugs/Public/show_bug.cgi?id=29271 appears to me to be mostly a duplicate or subset of this bug, and I have closed it as such.

wolenetz commented 8 years ago

From my earlier summarization of TPAC discussion, the following bullet point doesn't take into account buffered ranges changing after a setSeekableRanges() call, such that there could be newly buffered media outside a seekable range.

2) disallow setting seekableRange that would clip any buffered media (this is meant to prevent weird edge cases where there's buffered media in the timeline that can't be seeked to)

@jdsmith3000 also described an additional constraint earlier in this issue that I tend to agree with:

allowing seekable ranges to be set only on duration +infinity content, or where the API itelf cannot supply valid time ranges.

@jdsmith3000, how does the following approach sound to you (which I believe follows the TPAC discussion results, along with these additional constraints)?

MediaSource.mergeCustomSeekableRanges(TimeRanges custom_seekable_ranges): ([1]) 1) throws exception if duration is not +Infinity. (e.g. either NaN or finite) 2) saves custom_seekable_ranges for use in later MediaSource.buffered queries ([2]). (custom_seekable_ranges is initialized to an empty TimeRanges object.)

Change MediaSource's extension of HTMLME.seekable as follows: The HTMLMediaElement.seekable attribute returns a new static normalized TimeRanges object created based on the following steps: If duration equals NaN: -> Return an empty TimeRanges object. If duration equals positive Infinity (which can be explicitly set by apps or becomes set if duration previously NaN and the first initialization segment processing by (any of each) SourceBuffer has no duration.) ->[compute the union of HTMLME.buffered and the most recently set custom_seekable_ranges, if any. Return the result of the computation (which could still be an empty TimeRanges object if nothing is buffered yet, and custom_seekable_ranges is empty). Otherwise (duration is finite): Return a single range with a start time of 0 and an end time equal to duration. ([3])

Notes: ([1]) I don't really like "setSeekableRanges" if seekable must respond with a union of these custom ranges and whatever the current HTMLME.buffered attribute reports. However, "mergeCustomSeekableRanges" sounds quite wordy, and I'm looking for simpler wording. Simpler name is welcome! ([2]) Note that SourceBuffer.buffered has no change in this proposal; neither is there any SourceBuffer.extendSeekableRanges() in this proposal. ([3]) It occurs to me that an app could change a +Infinity duration to a finite duration. This path would then ignore any previously set custom_seekable_ranges until and unless duration becoms +Infinity again. This note might be basis for at least an official "NOTE" in the MSE spec.

@jdsmith3000, please let me know what you think -- I wanted to get this approach vetted prior to doing the gritty html spec change, since it attempts to address a couple constraints not previously fully clarified in my TPAC discussion summary.

wolenetz commented 8 years ago

w.r.t. naming the method, perhaps ".setLiveSeekableRanges"? I don't like "merge", because developers might infer incorrectly that multiple calls to the method result in a union of the ranges across the calls being used in the extended seekable logic, rather than just the most recently set ranges. "Live" might be an overstatement, since +Infinity duration is required for Live, but other media (with unknown duration) is not required to be Live, yet could have +Infinity duration.

I'm not sure that the cost of doing union with currently buffered media is worth it. @jdsmith3000, do you recall precisely any of the concerns from TPAC around custom seekable ranges clipping buffered media time?

Also, regardless of allowing clipping or not by custom seekable ranges, it seems to me that we would also need some method like "clear custom seekable ranges extension" (and some logic in MediaSource.seekable) to use the current spec's logic (which, if something is buffered, is a range from time 0 to buffered end time) if the app has never provided (or wants to explicitly "unprovide") custom seekable range settings. Setting an empty custom seekable range setting might be desired to mean something different: a) if clipping is allowed: it would mean nothing is seekable, just return empty TimeRanges object for HTMLME.seekable b) if clipping is disallowed: it would mean just return the current buffered TimeRanges (not necessarily beginning at time 0)

versus (if there is not a provided custom seekable ranges): c) Return an empty TimeRanges object if nothing is buffered, or an object with a single range from time 0 to buffered end time if something is buffered.

I'm trying to thread the concerns, and it would help to understand how important avoidance of clipping really is.

jdsmith3000 commented 8 years ago

I don't recall the TPAC discussions for sure, but believe that there was some concern about letting a custom range clip the buffered one. I think your suggestion of using the union of the custom range and the buffered one addresses this, and means that custom ranges are primarily for expanding seekable ranges, likely into live DVR time ranges.

Is it your intent that setting custom ranges can be done multiple times, and individual ranges are stored for use in calculating the union with buffered data? I've been thinking we want a single custom range to for DVR type uses, and this would need to be updated frequently to insure the oldest DVR time is accurate.

On naming, I think shorter is better, and prefer setSeekableRange despite the union operation with buffered data.

dmlap commented 8 years ago

@wolenetz @jdsmith3000 anything I can do to help resolve this issue? Sounds like the preference is for a seekable range setter instead of get/remove. #42 doesn't fit that bill but if it's roughly the sort of contribution you'd expect, I'd be happy to trying a new patch for setSeekableRange.

wolenetz commented 8 years ago

@dmlap, your PR is currently pending on: 1) fixing the failed IPR check (w3c folks are aware), and 2) ensuring it covers the approach outlined in @jdsmith3000 's most recent response, above. If (1) is delayed much longer, either @jdsmith3000 or I will move forward with a separate PR to fix this.

dmlap commented 8 years ago

@wolenetz thanks for the update. I don't know if or when I'll be able to resolve (1). If my PR is making this issue harder to address, please don't hesitate to close it. I'll be happy knowing this live streaming use-case is supported by MSE v1-- it doesn't matter to me if my wording is used or not.

wolenetz commented 8 years ago

OK, it looks like we need to make progress on this. I'll compose a PR soon to attempt to fix this.

paulbrucecotton commented 8 years ago

@wolenetz: When is your proposed PR going to be ready for review?

wolenetz commented 8 years ago

We synced on this during today's editors' sync. @jdsmith3000 and I are on the same page and I will proceed with a PR. The naming of the new method is still something we'd like to simplify, and we also think just a single range (if any) for the custom_seekable_range is necessary (we don't have any other use cases identified which need multiple disjoint custom_seekable_ranges, and the existing spec behavior for seekable with finite duration is a single range [0,duration).

wolenetz commented 8 years ago

fyi - this is a test message, since I'm suddenly getting a warning on all my github pages: "One of our mostly harmless robots seems to think you are not a human. Because of that, it’s hidden your profile from the public. If you really are human, please contact support to have your profile reinstated. We promise we won’t require DNA proof of your humanity."

I've contacted their support. Hopefully this won't impact my immediate work.

wolenetz commented 8 years ago

I didn't unassign myself. Looks like GH's "mostly harmless robot" did that and I can't reassign myself. I'll continue until blocked...

wolenetz commented 8 years ago

It looks like GH has now just fixed my profile problem. That outage happened at a bad time, to say the least.

falk-stefan commented 4 years ago

I am not sure if this is a 100% related but how are we seeking when we load audio files in chunks or segments? The issue I am having is that if I want to play an mp3 from the middle, currentTime will be set to an arbitrary value and increase from there instead of going from where I would like it to.

See my stackoverflow question for details. Any help on this would be awesome!

wolenetz commented 4 years ago

@falk-stefan, I've responded to your stackoverflow question today. Also, in addition to duration, you'll probably want to inspect in your repro what your media element reports as its buffered ranges when the seek is issued. (e.g. element.buffered).

w3c / media-source

Seekable differs from non-MSE behavior #5