Open cjpillsbury opened 1 year ago
NOTE: Although I am recommending Proposal 1, particular decisions on the names or values may still be up for discussion. For example, we may want to model dvr
values as "yes" |"no" | "unknown"
instead of boolean | null
or use a different term than dvr
the the property/event name if there are concerns that this would be confusing if/when introducing sliding
.
Thinking more about Proposal 1
, we actually get another benefit:
Assuming we always use manifest/playlist parsing as the source of truth for "Standard DVR," this means we know for sure whether or not a given media stream meets this condition. As such, we should not need to model the "any"
state. Here's why: If we parse the manifest/playlists, we already know the stream is e.g. !"standard"
. We still may not know if "sliding"
(See Google Doc for reasons why), so "unknown"
would be required. However, for any condition where we would have successfully identified "sliding" | "standard"
(aka "any"
), we would now know ("sliding" | "standard") && !"standard"
(aka "sliding"
). In other words, logically, "any"
would be an impossible state and "drop out" if we use the proposed approach.
To me, an enum seems like the correct type, since there's potential multiple values. Plus, even if we ignore sliding
currently, it would easily allow us to extend this property to include it without potential future breaking changes.
I do agree that an any
seems unnecessary. For the base case, I'd expect it to be "standard" | "unknown"
. Then, it could be extended to add "sliding"
and maybe "none"
as well, though, I suspect, "none"
could be covered by "unknown"
I believe if we have a single property that we intend to extend with new values, we run a greater risk of backwards compatibility, though we could account for that at an integration level. For example, using your proposal, in Media Chrome, we could start by treating any value that's not "standard"
as "for us, this is not DVR", or, if we wanted to, we could also support a basic inferred version via media.seekable
for any case where dvr === undefined
.
If we go this route, the initial implementation of dvr
(sticking within the scope/spirit of proposal 1 but more directly anticipating proposal 2/"sliding"
):
get dvr() {} : "standard" | "unknown" | "none"
"standard"
- true
case from the original proposal"none"
- streamType === "on-demand"
"unknown"
- streamType === "live" && dvr !== "standard"
, or the value is tbd (e.g. no src
set, still fetching/parsing the playlists/manifest, etc.)Another callout: All computation of what's described here as "sliding"
, as well as all concerns/considerations for disambiguation, can be computed from monitoring properties of an HTMLMediaElement
. In other words, the concept of "sliding"
may not be appropriate for media-ui-extensions
. This differs from both streamType
("live" | "on-demand"
) and "standard"
DVR, in that these have well-defined correlates in the MPEG-DASH and HLS specifications themselves.
For example, Media Chrome can certainly (eventually) add some kind of support for inferring "sliding"
DVR based on, among other things, monitoring HTMLMediaElement::seekable
values in a way that's consistent with what is under discussion here, but the only clear advantage to having it well-defined in an (extended) HTMLMediaElement
is specifically for the ability to derive it quickly/reliably for MPEG-DASH (but not for HLS).
Re: "DVR" - Unless we can find a strong defense for "DVR" being a universally known term, we should find more accessible language.
In proposal 2:
seekable.end - seekable.start <= minSlidingWindow
=> DVR === "sliding"
Does that then change as the seekable window changes? If not, what value is actually being used to compare against minSlidingWindow
, and could we just expose that value instead? It feels kind of round-a-about to give the media element a value to do simple math on.
My counter proposal is:
maxLiveDuration
| expectedLiveDuration
| targetLiveWindow
| ???
"standard"
- Infinity
"sliding"
- isFinite
"none"
& "unknown" & - NaN
Either we know up front that the seekable window is expected to be long enough or we don't. If it eventually gets long enough, I can tell that from seekable. I don't need this property to also reflect that.
A couple of notes after an IRL conversation with @cjpillsbury:
I think this property should be solely focused on what the media element can know initially (e.g. loadedmetadata
, master manifest parse). I don't think anything after that point is really valuable, as you don't want your UI jumping around mid playback (either it starts with a progress bar or it doesn't). The only thing we learn after that point is how big the seekable window gets, which is already available via seekable
. In reality, if you can't tell from the manifest what to do initially, you're going to configure the player another way or just not show a progress bar.
@cjpillsbury pointed out that we can know for certain that the stream will not be seekable, which my proposal doesn't cover. Here's a revised one:
targetLiveWindow
| ???
"standard"
- Infinity
"none"
- 0
"sliding"
- > 0 and < Infinity
"unknown"
or "inapplicable"
(on-demand) - NaN
If them media knows this live stream is not intended to be seekable, then it can set the window to zero.
For the UI developer, the answer to "show progress bar?" is targetLiveWindow > 0
.
I currently like targetLiveWindow
because:
It could be that the actual number of window duration is never useful or known. In which case maybe these should just all be string values (but then...we have to agree on names). I don't know the state of the world there.
"target" signals imprecise, which the window will be
@heff Yup, that's exactly right. Was thinking specifically about this case based on your proposal, and I think there are actually some benefits to having this value available for the UI to consume, as long as it's treated as distinct from seekable
, which should model the actual currently seekable ranges.
It could be that the actual number of window duration is never useful or known.
This is effectively true for HLS, at least if we're trying to derive it from the playlists, since
targetLiveWindow
" will be#EXTINF
/segment durations can change over time, which is the only values we can use to compute this.That said, that may be fine, as long as we support changes over time (as briefly discussed, below) and/or explicitly add a setter for targetLiveWindow
to this proposal (folks can still implement a setter even if it's beyond the scope of the media-ui-extensions definition).
I think as long as we also assume it's valid that this value can change over time for a given media element's src (with a corresponding event, e.g. targetlivewindowchange
), this is feeling like a pretty good API to me.
I do have one mild concern here, though I don't think it's sufficient to suggest an alternative. By having a single value here, this makes incremental support of this API slightly more likely to cause unanticipated UI changes for folks depending on this value. For example, if <mux-video/>
and <mux-audio/>
add support for targetLiveWindow === Number.POSITIVE_INFINITY
as a first pass (very likely), any "sliding"
case would need to get represented as either NaN
or 0
. If we eventually then add support for "sliding"
cases, those would suddenly start showing a progress bar. That may be a reasonable expectation though? Not sure.
@gkatsev let me know if you have any concerns with this proposal. Otherwise, I'll plan on spiking on this approach.
Any reason to incrementally support targetLiveWindow
? Seems reasonable enough to implement it fully and then only have the UI handle a subset of cases.
This seems like a reasonable solution.
@gkatsev regardless of how we approach it for Open Elements, I think the concern still remains for other folks implementing this incrementally.
I think as long as we also assume it's valid that this value can change over time for a given media element's src (with a corresponding event, e.g. targetlivewindowchange), this is feeling like a pretty good API to me.
I think what we should avoid is an API that might unexpectedly change the UI midstream. Connecting this to an event that's similar to loadedmetadata
or durationchange
and only change once with a new source would do that. But a targetlivewindowchange
that can happen midstream would cause the issue. I'm not totally following the HLS need, except that we don't have a great answer there for a targetLiveWindow
value in the sliding window scenario. I could see a world where the player sets targetLiveWindow to the initial playlist size. A value of 1
would be good enough to make a UI decision from, if the player can't know any more.
Remind me how we know that an HLS playlist should definitely be standard live, not sliding?
Remind me how we know that an HLS playlist should definitely be standard live, not sliding?
We will never know definitively from the playlists alone, as it's "underdetermined" wrt the spec. This is discussed in detail in the referenced google doc. We can plausibly make some safe assumptions for the vast majority of non-EVENT HLS live playlists (since the other scenarios are fairly non-standard) if we monitor either the sum of EXTINF
durations (with some additional offsets for holdback) or the seekable
duration and that value stops growing and is less than an established "minimum sliding window", but that breaks in the other direction at the start of a live (or "sliding") media stream (see the google doc for details).
I think what we should avoid is an API that might unexpectedly change the UI midstream. Connecting this to an event that's similar to loadedmetadata or durationchange and only change once with a new source would do that.
I'm not sure there's a reliable pre-existing event we can tie to here, since these values require fetching and (simple) parsing of the manifest/playlists. For playback engines wrapped in a web component, that means we'd have to assume we can derive these values in advance of the engine setting values on the HTMLMediaElement
that would trigger any proposed event (plausible, but still an assumption). For native browser HAS playback (e.g. HLS + Safari <video src="url"/>
), this means we'd be parallel fetching and parsing the manifest for the relevant values, and we'd have to somehow reliably do this before any proposed "initialization" events (implausible, likely not possible to guarantee). In the case of e.g. <mux-video/>
, we rely on both of these cases, depending on the playback environment.
NOTE: This proposal began as a subset of the Stream Type - Proposal #3 but was descoped due to complexities and the decision to model it as a separate state.
NOTE: A discussion on the complexities and permutations of "DVR", both using available HTTP Adaptive Streaming (HAS) manifests/playlists and inferring from the state of a given
HTMLMediaElement
instance can be found in this google doc, which also has comments enabled. Please read this document, as it provides relevant context for the proposal below.Overview and Purpose
A subset of "live streaming media" is intended to be played with seek capabilities for the viewer. This is frequently referred to as "DVR," and typically falls into one of two categories:
For both of these cases, although the media is live, the "intention" is to still allow users to seek through the media during playback.
Proposed DVR Types & Definitions
Below are the total possible DVR states (for more on why, see the Google Doc, referenced above).
"standard"
- The media stream is live and all previous media content will be available"sliding"
- The media stream is live and a sufficient amount of previous media content will be available for seeking"none"
- The media stream is on-demand, or the media stream is live and there will not be a sufficient amount of previous media content available for seeking."any"
- The media stream is live and is either"standard"
or"sliding"
, but it is (currently) ambiguous which of these two it is."unknown"
- There is no media stream, or the media stream is live, but it is (currently) ambiguous if it's"none"
(no DVR),"standard"
,"sliding"
, or"any"
.undefined
- The DVR feature is unimplemented by the media element.Proposed Interface 1 (narrow implementation -
"standard"
support only)This version of the proposal intentionally omits/"doesn't solve for" any account of
"sliding"
.HTMLMediaElement::get dvr() {} : boolean | null
true
means"standard"
false
means"none" | "sliding"
(where"sliding"
is not within the scope of this proposal and therefore is "under-determined" by this value alone)null
means"unknown"
undefined
or not defined means unsupported"dvrchange"
detail = dvr
HTMLMediaElement
wheneverdvr
changesProposed Inferring 1 (narrow implementation -
"standard"
support only)Only rely on HLS playlist (
EXT-X-PLAYLIST-TYPE:EVENT
) or MPEG-DASH manifest (MPD@type="dynamic && !MPD@timeShiftBufferDepth
) parsing to derivedvr
. Any other process will result in ambiguities. For more, see the Google Doc, referenced above.Proposed Interface 2 (exhaustive)
type DVRType = "standard" | "sliding" | "none" | "any" | "unknown"
HTMLMediaElement::get dvr() {} : DVRType
undefined
or not defined means unsupported"dvrchange"
detail = dvr
HTMLMediaElement
wheneverdvr
changesHTMLMediaElement::get minSlidingWindow() {} : number
"sliding"
, akaHTMLMediaElement::seekable.end(0) - HTMLMediaElement::seekable.start(0) >= HTMLMediaElement::minSlidingWindow
->HTMLMediaElement::dvr === "sliding"
HTMLMediaElement::set minSlidingWindow(value: number) {} : void
Proposed Inferring 2 (exhaustive)
To be documented formally if this is the preferred adopted proposal. Most of this may be determined from the Google Doc, referenced above.
Recommendation: Proposal 1 (narrow implementation -
"standard"
support only)Reasons for recommendation
true|false
for both HLS and MPEG-DASH "immediately" (after loading and parsing the playlists/manifests once per media stream)"sliding"
(and corresponding "uncertain" states such as"any"
or"unknown"
in the case of early stream starts). This is because any implementations that add a future"sliding"
support (assuming new properties are introduced) will simply treat these as"live"
unless/until they integrate with the new interface. This feels far less risky than the other way around, where"live"
streams would suddenly and unexpectedly start showing up as "DVR" (seekable).