Closed cjpillsbury closed 1 year ago
I've always had a hard time trying to talk about DVR vs sliding window DVR and what not. So, having an agreed set of names will definitely simplify things.
Some questions/comments:
What does MOE
stand for in the algorithm?
Why not account for the HLS and DASH properties in the algorithm? Could add something like the following as step 2:
If the media provides a stream time [see HLS, DASH], set
streamType
to the provided value. End algorithm.
Is the main difference between live and sliding window DVR is at what threshold should you start showing the dvr-like controls? Maybe it doesn't need an official stream type, as different users likely have a different tolerance of the threshold to bring up the controls.
What does MOE stand for in the algorithm?
"margin of error". It would be a constant. I can update the algorithm to define the term and loosely define the value.
Why not account for the HLS and DASH properties in the algorithm?
I was hoping to avoid scope creep since how these values are provided and what information you have available will vary. E.g. For HLS, if you're using "native browser" playback, the algorithm is effectively the same since you don't have direct access to the playlists. For MPEG-DASH, there is only "static"
(-> "vod"
) vs. "dynamic"
(-> "live" | "dvr" | "sliding"
).
That said, it may be worth at least having some discussion on how these would be inferred from the manifests/playlists and how they change over time?
Is the main difference between live and sliding window DVR is at what threshold should you start showing the dvr-like controls?
It could also impact the "type" of UI you'd want to present, specifically around seeking. "sliding"
is somewhere in between a "live" experience and a "dvr" experience, since the start time is "moving under foot" so designers may want to account for that. Theoretically, they could both fall into the category of "DVR" (which is one reason I kept it as a potential "future" type), but most designs that conflate the two are particularly bad for "sliding" (bracketing the clunkiness of most "dvr" designs that lack a known/estimated duration).
I think for now we can likely pretend the distinction doesn't exist and treat it out of scope, but with the current direction of this proposal, thinking about either an extensible/customizable set of possible stream types or, at the very least, a set that can change over time would be good when trying to think through risky assumptions in v1.
Could this also be represented as:
StreamType:
live
(duration = Infinity)vod
(duration = other number)DVRWindow:
I think it would be easier on the UI if you can build a general Live UI, without having to know all the live types.
I think this looks great as an alternative. Let's assume we'll move forward with this. A few callouts on the details:
"on-demand"
instead of "vod"
for the value (tl;dr - make it less "jargony" and don't implicitly assume video)undefined
(either the explicit value undefined
or literally not defined)!("live" || "on-demand")
for StreamType, not positive number for DVRWindow)NaN || null || undefined
)dvrWindow
and streamType
on a(n extended) media element.
stream-type
& dvr-window
(more consistent with media chrome & Open Elements naming conventions; higher legibility)streamtype
& dvrwindow
(more consistent with generic HTML attr naming conventions; lower legibility)Just to call this out explicitly (discussed out of band):
streamType="on-demand"
&& dvrWindow=30
, which is "technically invalid".I do like having the two properties, since DVR window is a description of live to me.
DVR Window could also be defined as only valid if the stream type is live
, but might make sense to keep it loose and allow for vod to be treated as live/dvr depending on what's set.
Might make sense to have a "minimal" support and "maximal" support.
Thinking about it some more, I think that having the dvr window require a number a problem. Specifically, what if someone wants to play a sliding window DVR but doesn't know the window is because they offloaded all the video stuff to some service. I think there should be a way to say "I want this to have the DVR UI, but figure out the window from the content yourself"
Thinking about it some more, I think that having the dvr window require a number a problem. Specifically, what if someone wants to play a sliding window DVR but doesn't know the window is because they offloaded all the video stuff to some service. I think there should be a way to say "I want this to have the DVR UI, but figure out the window from the content yourself"
Would this be a problem if they relied on the "inferred" value use case described above?
I guess for me, it's the expectation of the type of UI that is being shown based on these configurations.
For stream-type="on-demand"
it should show the regular UI we're used to with a start time and duration.
For stream-type="live"
, by default, it should show a simpler UI without a progress bar or other timings. But, I'd like to configure it with DVR, matching the HLS event type, where the UI looks most like an on-demand stream, where there is a progress bar and the times start at 0. In addition, I'd like a second DVR UI for a sliding window which shows like the last 30 seconds or 2 hours or whatever the stream is configured as, however, I don't want to know what the stream is set to.
Let's use "on-demand"
👍
let's assume the "nil" cases are either...strict and always
I like strict, and should probably never be undefined
unless actually not implemented. e.g. duration = NaN, srcObject=null. Then we can detect when this isn't implemented.
streamType: null
dvrWindow: NaN (assuming it's always a Number otherwise)
Properties are (can be?) inferred based on the media content but are overridable via a setter (aka not read only)
That feels like it could get complicated. At the UI layer (media-chrome) you could certainly decide to ignore the media element values, but setting the stream type on the media element itself is like saying "you think you're playing vod, but you're really playing live". It's open to interpretation how the media element should handle that.
Not sure these should be part of the media-ui-extensions formalization, but attributes could be either
Yeah, probably not part of media-ui-extensions since media elements don't push state out to attributes.
@gkatsev I'm following everything except "I want this to have the DVR UI, but figure out the window from the content yourself"... "however, I don't want to know what the stream is set to".
Are we talking about:
How would one "figure out the window from the content" if the media element isn't reporting that detail through a property like dvrWindow?
Finally, alt proposal for dvrWindow
is liveWindow
, for similar reasons to vod/on-demand.
I think I may have complicated things by not being extra clear in my thoughts, and also maybe not verifying the specific constraints on this proposal. Basically, there are two issues at hand:
For 1, the stream type and live window stuff can generally be figured out from the underlying video data, like duration being Infinity means live and the live window is seekableEnd-seekableStart. For 2, we want to be able to provide this data from the outside. What I meant by "however, I don't want to know what the stream is set to" is that a player user may not know how a particular live stream is configured in terms of number of segments and segment durations and just wants to be able to configure the player to show a particular UI. Mux Player is such an example, because you can set stream-type and get the corresponding UI, regardless of what the video actually is.
Hopefully, that clarifies things.
@gkatsev yep, thanks
like duration being Infinity means live and the live window is seekableEnd-seekableStart
Do we need this new API then? Media chrome, Mux Player, and other players can of course add some sugar to make working with different stream types easier, but for the sake of media elements specifically, do we already have what we need to determine stream type and the dvr window? Is seekable
missing anything?
So, would every component need to check if duration is Infinity and what the seekable is before doing anything?
Also, with hls.js for example, the seekable end is slightly different from hls.js's liveSyncPosition
(I'm not exactly sure why, but that's another matter). This could get pushed down into the slotted media element implementation.
Actually, this brings up the question: is this a property that's supposed to be exposed from the media element?
Maybe the solution to my dichotomy is that media-chrome should use the media element's provided stream-type unless media-controller was given a stream-type via an attr? Separating the two this way also makes it so that there isn't a concern about setting the property from inside, while also making it be settable from the outside.
is this a property that's supposed to be exposed from the media element?
Yes. For context, this whole repo is about "Extending the HTMLVideoElement API". Any conversations about media chrome or how a player would use the API should only be to inform the media element API design. i.e. this isn't the forum to solve media chrome specific things, and if we're headed that route we should push it over to a media chrome issue.
would every component need to check if duration is Infinity and what the seekable is before doing anything
In the media chrome case, no. Only media-controller would check the media element's properties, and then it would translate it into stream type, etc for other components. I feel like that's fine. It's a whole other thing to say every [slotted] media element has to do that translation work and expose a new API for the result.
Also, with hls.js for example, the seekable end is slightly different from hls.js's liveSyncPosition (I'm not exactly sure why, but that's another matter). This could get pushed down into the slotted media element implementation.
Yeah, that's interesting. I think we'd expect custom media elements make their own seekable
property match what's intended to be seekable for the dvr window, meaning not just pass through the native video element's seekable data if it's not quite right. Then it'd be good to understand if the native video element needs to fixed somehow, per browser, to support live windows better.
Since this is intended for media-ui-extensions, I'm hesitant to conflate data APIs with UI, as these can come apart (e.g. there may be needs to have a programmatic seekable
that is a distinct value from liveWindow
(given e.g. the way MediaSource
or other non-src
values can work). Similarly, even if we don't want this to be a part of the media-ui-extensions
, I suspect we'll want/need setters for these values. For example, there is no guaranteed, in-spec inferable way with MPEG-DASH to distinguish between a small seekable window in live ("dynamic") content to avoid stalls/account for latency vs. "DVR"/"sliding window". Having these values be settable allows a developer to announce how they want the UI to be presented:
Just to wrap this up, I'm going to pin down what we have so far:
streamType
"live" | "on-demand" | null | undefined
streamtypechange
, detail = streamType
"live"
- HTMLMediaElement::duration === Number.POSITIVE_INFINITY
"on-demand"
- Number.isFinite(HTMLMediaElement::duration)
HTMLMediaElement durationchange
Event(assumes a media playlist for the current src
has been loaded at least once)
"live"
- !#EXT-X-PLAYLIST-TYPE || #EXT-X-PLAYLIST-TYPE:EVENT
"on-demand"
- #EXT-X-PLAYLIST-TYPE:VOD
(assumes the manifest MPD for the current src
has been loaded at least once)
"live"
- MPD@type="dynamic"
"on-demand"
- !MPD@type || MPD@type="static"
DVR will be modeled separately from streamType
as a boolean
.
dvr
true | false | null | undefined
dvrchange
, detail = dvr
TBD
Out of scope proposal:
HTMLMediaElement::seekable.end(0) - HTMLMediaElement::seekable.start(0) >= DVR_WINDOW_SIZE
where DVR_WINDOW_SIZE
is some determined duration threshold sufficiently large to count as DVR (or "sliding window") and may potentially be configurable via a property or attribute
(assumes a media playlist for the current src
has been loaded at least once)
true
- #EXT-X-PLAYLIST-TYPE:EVENT
false
- !#EXT-X-PLAYLIST-TYPE:EVENT
TBD
Out of scope proposal:
MPD@timeShiftBufferDepth > DVR_WINDOW_SIZE
where DVR_WINDOW_SIZE
is some determined duration threshold sufficiently large to count as DVR (or "sliding window") and may potentially be configurable via a property or attribute
Valid Values:
"live" | "on-demand" | null | undefined
why have both null
and undefined
here?
Inferred DVR
TBD
Wouldn't this be seekable.start(0)
doesn't change?
why have both null and undefined here?
undefined
would essentially mean "unimplemented". That will probably be true for any new media-ui-extension. null
means implemented but unknown.
Would it be better to have "unknown"
be the unknown value rather than null
to keep the type the same except for when there's no support for the feature? i.e., it'll make the type be string | undefined
.
It's worth noting that you can just rely on duration
for stream type, but there's value in a specific streamType property, because of the async nature of duration
. The player might know the stream type before duration
is set, even from other metadata about the video.
Would it be better to have "unknown" be the unknown value rather than null
I can see that making sense and it's more direct. But null
also matches "no attribute". All the aria props (strings) default to null. If it was a string "unknown" there might be temptation to sprout that value to an attribute? Although...this is probably property only, not an attribute, since it's not user configurable.
@cjpillsbury thanks for writing this up. I'm still not totally clear on the reasons why this is needed from a media element in addition to seekable
, and how it would be used. Is it basically "this media is meant to be accessible as DVR (have a progress bar), no matter what the seekable range might be right now"?
Is it basically "this media is meant to be accessible as DVR (have a progress bar), no matter what the seekable range might be right now"
Yeah, it solves that problem and makes it easier definitively disambiguate between live (which always has some seekable range) and dvr for e.g. hls.
@gkatsev @heff I propose we treat "sliding window" and its relationship to dvr as out of scope for this discussion.
@gkatsev @heff since there's still (out of band) discussion about "DVR" more generally, I propose we also treat DVR as out of scope for this discussion. I've written up a google doc discussing some of the complexities and considerations around DVR, available for comment here. I'll go ahead and start a separate github discussion specifically for DVR with a link to the google doc.
Assuming we descope all DVR from the Stream Types discussion, I believe we are close to finalizing this proposal for "live" and "on-demand". The only potential disagreement that remains is how we should represent the "unknown" case where streamType
is supported by the media element. The two proposals here are:
null
"unknown"
I have a slight leaning toward (2) to make it explicit, though I'm amenable to either. @gkatsev @heff I will defer to whatever you two think is the better value.
Once this is decided, let me know if there is anything else outstanding to finalize this proposal. Otherwise, let's make this decision and finish our first media-ui-extensions proposal 🎉.
I would lean towards 2 "unknown"
as well. Prior art:
NaN
which is a Number
type.I'm good with unknown. In media-chrome I don't think we should follow that pattern for the attribute, and we'll stick with "no media-stream-type attribute" (getAttribute returns null) means unknown or unimplemented by the media. But that's a different problem space. If there's disagreement with that we can followup in the media chrome thread.
Overview & Purpose
The idea of different “stream types” has been around for a long time in various HTTP Adaptive Streaming (HAS) standards and its precursors in some manner - minimally distinguishing between “live” content and “video on demand” content. However, these categories aren’t consistently named or distinguished in the same way across the various specifications. Moreover, there is no corresponding API in the browser. Yet these categories directly inform how one expects users to consume and interact with the media, including what sort of UI or “chrome” should be made available for the user. By way of example, the built in controls/UI in Safari that show up for a live src are different than those that show up for a VOD src. This proposal aims to normalize the names and definitions of StreamTypes (in a way that is extensible and evolvable over time) by way of how they are expected to be consumed and interacted with by a viewer/user. It also provides a concise and easy to understand differentiator for anyone implementing different UIs/controls/"chromes" for the various stream types.
An additional goal of this proposal is to recommend for MSE-based players or “playback engines” to try to normalize their use of existing APIs to be as consistent as possible with the proposed inferred StreamType Algorithm.
Proposed StreamType Types & Definitions
"unknown"
(default) - There is no media content or there is currently insufficient information to determine the StreamType of the current media content (e.g. metadata or similar is still loading, async default StreamType inference not yet done)"vod"
(“Video on Demand”) - The media content has a known start and end time and is intended to be randomly seekable from start to end as long as the content is available at all"live"
- The media content is intended to be viewed at the “live edge” as forward/subsequent content is made available over time and is not intended to be seekable at all"dvr"
- The media content has a known start time and by default is intended to be viewed at the “live edge” as forward content is made available over time, but all backward/previous content is also available for seeking from start to the current “live edge”"sliding"
(“Sliding Window”, “Partial DVR”) - The media content is by default intended to be viewed at the “live edge” as forward content is made available over time, but is also intended to be seekable within a (roughly) consistent time window relative to the current “live edge”Proposed Interface
type StreamType = "unknown" | "vod" | "live" | "dvr" (| "sliding"?) (| string?)
HTMLMediaElement::get streamType() {} : StreamType
streamType
is set. See below for algorithmHTMLMediaElement::set streamType() {}
streamtypechange
streamType
changes (inferred or explicitly set)Proposed Stream Type Inferring (overridable)
Algorithm (Pseudo-code):
Additional Considerations
"dvr"
and"live"
This algorithm should be re-applied/computed whenever the dependent variables may change"live"
/"dvr"
stream, the computed stream type could change to"vod"
based on the currently proposed algorithm.Related Standards/Specs Definitions
Distinguishing/Categorizing Types
RFC 8216 (“HLS”)
VOD
("vod"
),EVENT
("dvr"
)EXT-X-PLAYLIST-TYPE
tag value (https://datatracker.ietf.org/doc/html/draft-pantos-hls-rfc8216bis-10#section-4.4.3.5)"live"
EXT-X-ENDLIST
tag (https://datatracker.ietf.org/doc/html/draft-pantos-hls-rfc8216bis-10#section-4.4.3.4)EXT-X-PLAYLIST-TYPE
tag (https://datatracker.ietf.org/doc/html/draft-pantos-hls-rfc8216bis-10#section-4.4.3.5)ISO/IEC 23009-1 (“MPEG-DASH”)
static
(“vod”
),dynamic
(“dvr”
or“live”
- cannot differentiate by attr)MPD@type
attribute value (§5.3.1.2, Table 3 — Semantics of MPD element)“dvr”
MPD@timeShiftBufferDepth
(§5.3.1.2, Table 3 — Semantics of MPD element) grows consistently with the available Segments & wall clock time and has a consistent computed start time (similar to inferred algorithm for"dvr"
)Duration for "live"/"dvr"
HTMLMediaElement::duration
MediaSource::duration
Seekable Range for "dvr"
HTMLMediaElement::seekable
MediaSource::setLiveSeekableRange()