video-dev / media-ui-extensions

Extending the HTMLVideoElement API to support advanced player user-interface features
MIT License
30 stars 8 forks source link

DVR State - Proposal #4

Open cjpillsbury opened 1 year ago

cjpillsbury commented 1 year ago

NOTE: This proposal began as a subset of the Stream Type - Proposal #3 but was descoped due to complexities and the decision to model it as a separate state.

NOTE: A discussion on the complexities and permutations of "DVR", both using available HTTP Adaptive Streaming (HAS) manifests/playlists and inferring from the state of a given HTMLMediaElement instance can be found in this google doc, which also has comments enabled. Please read this document, as it provides relevant context for the proposal below.

Overview and Purpose

A subset of "live streaming media" is intended to be played with seek capabilities for the viewer. This is frequently referred to as "DVR," and typically falls into one of two categories:

  1. "Standard DVR" - All previous media in the live stream will be available to seek to and play for the life of the live stream (and perhaps after its completion).
  2. "Sliding Window DVR" - Previous media in the live stream will be removed over the life of the live stream, but a "sufficiently long amount" of the previous media will be available to seek and to play during the live stream's life. Most often, the duration of the seekable media content will stay roughly the same (with some margin of error changes, due to the segmented nature of HAS), except for cases where the live stream has just begun. We can think of this value, implicit in the live media stream itself, as the "Sliding Window Media Size", or the size of the sliding window, as determined from e.g. the media manifest/playlists themselves.

For both of these cases, although the media is live, the "intention" is to still allow users to seek through the media during playback.

Proposed DVR Types & Definitions

Below are the total possible DVR states (for more on why, see the Google Doc, referenced above).

Proposed Interface 1 (narrow implementation - "standard" support only)

This version of the proposal intentionally omits/"doesn't solve for" any account of "sliding".

Proposed Inferring 1 (narrow implementation - "standard" support only)

Only rely on HLS playlist (EXT-X-PLAYLIST-TYPE:EVENT) or MPEG-DASH manifest (MPD@type="dynamic && !MPD@timeShiftBufferDepth) parsing to derive dvr. Any other process will result in ambiguities. For more, see the Google Doc, referenced above.

Proposed Interface 2 (exhaustive)

Proposed Inferring 2 (exhaustive)

To be documented formally if this is the preferred adopted proposal. Most of this may be determined from the Google Doc, referenced above.

Recommendation: Proposal 1 (narrow implementation - "standard" support only)

Reasons for recommendation

  1. much easier to implement
  2. provides a definitive true|false for both HLS and MPEG-DASH "immediately" (after loading and parsing the playlists/manifests once per media stream)
  3. provides less ambiguous controversial definitions derivable from the HLS and MPEG-DASH specs
  4. doesn't have significant concerns for backwards compatibility if/when we introduce "sliding" (and corresponding "uncertain" states such as "any" or "unknown" in the case of early stream starts). This is because any implementations that add a future "sliding" support (assuming new properties are introduced) will simply treat these as "live" unless/until they integrate with the new interface. This feels far less risky than the other way around, where "live" streams would suddenly and unexpectedly start showing up as "DVR" (seekable).
cjpillsbury commented 1 year ago

NOTE: Although I am recommending Proposal 1, particular decisions on the names or values may still be up for discussion. For example, we may want to model dvr values as "yes" |"no" | "unknown" instead of boolean | null or use a different term than dvr the the property/event name if there are concerns that this would be confusing if/when introducing sliding.

cjpillsbury commented 1 year ago

Thinking more about Proposal 1, we actually get another benefit:

Assuming we always use manifest/playlist parsing as the source of truth for "Standard DVR," this means we know for sure whether or not a given media stream meets this condition. As such, we should not need to model the "any" state. Here's why: If we parse the manifest/playlists, we already know the stream is e.g. !"standard". We still may not know if "sliding" (See Google Doc for reasons why), so "unknown" would be required. However, for any condition where we would have successfully identified "sliding" | "standard" (aka "any"), we would now know ("sliding" | "standard") && !"standard" (aka "sliding"). In other words, logically, "any" would be an impossible state and "drop out" if we use the proposed approach.

gkatsev commented 1 year ago

To me, an enum seems like the correct type, since there's potential multiple values. Plus, even if we ignore sliding currently, it would easily allow us to extend this property to include it without potential future breaking changes. I do agree that an any seems unnecessary. For the base case, I'd expect it to be "standard" | "unknown". Then, it could be extended to add "sliding" and maybe "none" as well, though, I suspect, "none" could be covered by "unknown"

cjpillsbury commented 1 year ago

I believe if we have a single property that we intend to extend with new values, we run a greater risk of backwards compatibility, though we could account for that at an integration level. For example, using your proposal, in Media Chrome, we could start by treating any value that's not "standard" as "for us, this is not DVR", or, if we wanted to, we could also support a basic inferred version via media.seekable for any case where dvr === undefined.

cjpillsbury commented 1 year ago

If we go this route, the initial implementation of dvr (sticking within the scope/spirit of proposal 1 but more directly anticipating proposal 2/"sliding"):

cjpillsbury commented 1 year ago

Another callout: All computation of what's described here as "sliding", as well as all concerns/considerations for disambiguation, can be computed from monitoring properties of an HTMLMediaElement. In other words, the concept of "sliding" may not be appropriate for media-ui-extensions. This differs from both streamType ("live" | "on-demand") and "standard" DVR, in that these have well-defined correlates in the MPEG-DASH and HLS specifications themselves.

For example, Media Chrome can certainly (eventually) add some kind of support for inferring "sliding" DVR based on, among other things, monitoring HTMLMediaElement::seekable values in a way that's consistent with what is under discussion here, but the only clear advantage to having it well-defined in an (extended) HTMLMediaElement is specifically for the ability to derive it quickly/reliably for MPEG-DASH (but not for HLS).

heff commented 1 year ago

Re: "DVR" - Unless we can find a strong defense for "DVR" being a universally known term, we should find more accessible language.

In proposal 2:

seekable.end - seekable.start <= minSlidingWindow
=> DVR === "sliding"

Does that then change as the seekable window changes? If not, what value is actually being used to compare against minSlidingWindow, and could we just expose that value instead? It feels kind of round-a-about to give the media element a value to do simple math on.

My counter proposal is:

Either we know up front that the seekable window is expected to be long enough or we don't. If it eventually gets long enough, I can tell that from seekable. I don't need this property to also reflect that.

heff commented 1 year ago

A couple of notes after an IRL conversation with @cjpillsbury:

I think this property should be solely focused on what the media element can know initially (e.g. loadedmetadata, master manifest parse). I don't think anything after that point is really valuable, as you don't want your UI jumping around mid playback (either it starts with a progress bar or it doesn't). The only thing we learn after that point is how big the seekable window gets, which is already available via seekable. In reality, if you can't tell from the manifest what to do initially, you're going to configure the player another way or just not show a progress bar.

@cjpillsbury pointed out that we can know for certain that the stream will not be seekable, which my proposal doesn't cover. Here's a revised one:

targetLiveWindow| ???

If them media knows this live stream is not intended to be seekable, then it can set the window to zero.

For the UI developer, the answer to "show progress bar?" is targetLiveWindow > 0.

I currently like targetLiveWindow because:

It could be that the actual number of window duration is never useful or known. In which case maybe these should just all be string values (but then...we have to agree on names). I don't know the state of the world there.

cjpillsbury commented 1 year ago

"target" signals imprecise, which the window will be

@heff Yup, that's exactly right. Was thinking specifically about this case based on your proposal, and I think there are actually some benefits to having this value available for the UI to consume, as long as it's treated as distinct from seekable, which should model the actual currently seekable ranges.

It could be that the actual number of window duration is never useful or known.

This is effectively true for HLS, at least if we're trying to derive it from the playlists, since

  1. we're in an "information deficit" on how long the "targetLiveWindow" will be
  2. technically, the sum of #EXTINF/segment durations can change over time, which is the only values we can use to compute this.

That said, that may be fine, as long as we support changes over time (as briefly discussed, below) and/or explicitly add a setter for targetLiveWindow to this proposal (folks can still implement a setter even if it's beyond the scope of the media-ui-extensions definition).

I think as long as we also assume it's valid that this value can change over time for a given media element's src (with a corresponding event, e.g. targetlivewindowchange), this is feeling like a pretty good API to me.

I do have one mild concern here, though I don't think it's sufficient to suggest an alternative. By having a single value here, this makes incremental support of this API slightly more likely to cause unanticipated UI changes for folks depending on this value. For example, if <mux-video/> and <mux-audio/> add support for targetLiveWindow === Number.POSITIVE_INFINITY as a first pass (very likely), any "sliding" case would need to get represented as either NaN or 0. If we eventually then add support for "sliding" cases, those would suddenly start showing a progress bar. That may be a reasonable expectation though? Not sure.

@gkatsev let me know if you have any concerns with this proposal. Otherwise, I'll plan on spiking on this approach.

gkatsev commented 1 year ago

Any reason to incrementally support targetLiveWindow? Seems reasonable enough to implement it fully and then only have the UI handle a subset of cases.

This seems like a reasonable solution.

cjpillsbury commented 1 year ago

@gkatsev regardless of how we approach it for Open Elements, I think the concern still remains for other folks implementing this incrementally.

heff commented 1 year ago

I think as long as we also assume it's valid that this value can change over time for a given media element's src (with a corresponding event, e.g. targetlivewindowchange), this is feeling like a pretty good API to me.

I think what we should avoid is an API that might unexpectedly change the UI midstream. Connecting this to an event that's similar to loadedmetadata or durationchange and only change once with a new source would do that. But a targetlivewindowchange that can happen midstream would cause the issue. I'm not totally following the HLS need, except that we don't have a great answer there for a targetLiveWindow value in the sliding window scenario. I could see a world where the player sets targetLiveWindow to the initial playlist size. A value of 1 would be good enough to make a UI decision from, if the player can't know any more.

Remind me how we know that an HLS playlist should definitely be standard live, not sliding?

cjpillsbury commented 1 year ago

Remind me how we know that an HLS playlist should definitely be standard live, not sliding?

We will never know definitively from the playlists alone, as it's "underdetermined" wrt the spec. This is discussed in detail in the referenced google doc. We can plausibly make some safe assumptions for the vast majority of non-EVENT HLS live playlists (since the other scenarios are fairly non-standard) if we monitor either the sum of EXTINF durations (with some additional offsets for holdback) or the seekable duration and that value stops growing and is less than an established "minimum sliding window", but that breaks in the other direction at the start of a live (or "sliding") media stream (see the google doc for details).

cjpillsbury commented 1 year ago

I think what we should avoid is an API that might unexpectedly change the UI midstream. Connecting this to an event that's similar to loadedmetadata or durationchange and only change once with a new source would do that.

I'm not sure there's a reliable pre-existing event we can tie to here, since these values require fetching and (simple) parsing of the manifest/playlists. For playback engines wrapped in a web component, that means we'd have to assume we can derive these values in advance of the engine setting values on the HTMLMediaElement that would trigger any proposed event (plausible, but still an assumption). For native browser HAS playback (e.g. HLS + Safari <video src="url"/>), this means we'd be parallel fetching and parsing the manifest for the relevant values, and we'd have to somehow reliably do this before any proposed "initialization" events (implausible, likely not possible to guarantee). In the case of e.g. <mux-video/>, we rely on both of these cases, depending on the playback environment.