video-dev / media-ui-extensions

Extending the HTMLVideoElement API to support advanced player user-interface features
MIT License
29 stars 8 forks source link

Stream Type - Proposal #3

Closed cjpillsbury closed 1 year ago

cjpillsbury commented 2 years ago

Overview & Purpose

The idea of different “stream types” has been around for a long time in various HTTP Adaptive Streaming (HAS) standards and its precursors in some manner - minimally distinguishing between “live” content and “video on demand” content. However, these categories aren’t consistently named or distinguished in the same way across the various specifications. Moreover, there is no corresponding API in the browser. Yet these categories directly inform how one expects users to consume and interact with the media, including what sort of UI or “chrome” should be made available for the user. By way of example, the built in controls/UI in Safari that show up for a live src are different than those that show up for a VOD src. This proposal aims to normalize the names and definitions of StreamTypes (in a way that is extensible and evolvable over time) by way of how they are expected to be consumed and interacted with by a viewer/user. It also provides a concise and easy to understand differentiator for anyone implementing different UIs/controls/"chromes" for the various stream types.

An additional goal of this proposal is to recommend for MSE-based players or “playback engines” to try to normalize their use of existing APIs to be as consistent as possible with the proposed inferred StreamType Algorithm.

Proposed StreamType Types & Definitions

Proposed Interface

Proposed Stream Type Inferring (overridable)

Algorithm (Pseudo-code):

  1. Let StreamType = "unknown"
  2. If mediaEl.duration === NaN (exit)
    • Aka StreamType = "unknown"
  3. If mediaEl.duration !== Infinity, StreamType = "vod" (exit)
    • Stricter: If mediaEl.seekable.end(0) === mediaEl.duration (or Math.abs(mediaEl.duration - mediaEl.seekable.end(0)) <= MOE for precision considerations)
  4. If media.duration === Infinity
    • Stricter: If mediaEl.seekable.end(0) < mediaEl.duration (or (mediaEl.duration - mediaEl.seekable.end(0)) > MOE for precision considerations)
      1. Let ChunkDuration = the presumed longest duration, in seconds, of a media chunk/segment
      2. Let SeekableStart0 = mediaEl.seekable.start(0)
      3. Let SeekableEnd0 = mediaEl.seekable.end(0)
      4. Wait ChunkDuration
      5. Let SeekableEnd1 = mediaEl.seekable.end(0)
      6. Let SeekableStart1 = mediaEl.seekable.start(0)
      7. If SeekableEnd1 === SeekableEnd0 (or Math.abs(SeekableEnd1 - SeekableEnd0) <= MOE for precision considerations), GOTO (iv)
      8. If SeekableStart1 === SeekableStart0 (or Math.abs(SeekableStart1 - SeekableStart0) <= MOE for precision considerations), StreamType = "dvr" (exit)
      9. If SeekableStart1 > SeekableStart0 (or Math.abs(SeekableStart1 - SeekableStart0) > MOE for precision considerations), StreamType = "live" (exit)
        • NOTE: This doesn’t account for/differentiate “sliding” StreamType
  5. (exit)
    • Aka StreamType = "unknown"

Additional Considerations

Related Standards/Specs Definitions

Distinguishing/Categorizing Types

RFC 8216 (“HLS”)

ISO/IEC 23009-1 (“MPEG-DASH”)

Duration for "live"/"dvr"

Seekable Range for "dvr"

gkatsev commented 2 years ago

I've always had a hard time trying to talk about DVR vs sliding window DVR and what not. So, having an agreed set of names will definitely simplify things.

Some questions/comments: What does MOE stand for in the algorithm?

Why not account for the HLS and DASH properties in the algorithm? Could add something like the following as step 2:

If the media provides a stream time [see HLS, DASH], set streamType to the provided value. End algorithm.

Is the main difference between live and sliding window DVR is at what threshold should you start showing the dvr-like controls? Maybe it doesn't need an official stream type, as different users likely have a different tolerance of the threshold to bring up the controls.

cjpillsbury commented 2 years ago

What does MOE stand for in the algorithm?

"margin of error". It would be a constant. I can update the algorithm to define the term and loosely define the value.

Why not account for the HLS and DASH properties in the algorithm?

I was hoping to avoid scope creep since how these values are provided and what information you have available will vary. E.g. For HLS, if you're using "native browser" playback, the algorithm is effectively the same since you don't have direct access to the playlists. For MPEG-DASH, there is only "static" (-> "vod") vs. "dynamic" (-> "live" | "dvr" | "sliding").

That said, it may be worth at least having some discussion on how these would be inferred from the manifests/playlists and how they change over time?

Is the main difference between live and sliding window DVR is at what threshold should you start showing the dvr-like controls?

It could also impact the "type" of UI you'd want to present, specifically around seeking. "sliding" is somewhere in between a "live" experience and a "dvr" experience, since the start time is "moving under foot" so designers may want to account for that. Theoretically, they could both fall into the category of "DVR" (which is one reason I kept it as a potential "future" type), but most designs that conflate the two are particularly bad for "sliding" (bracketing the clunkiness of most "dvr" designs that lack a known/estimated duration).

I think for now we can likely pretend the distinction doesn't exist and treat it out of scope, but with the current direction of this proposal, thinking about either an extensible/customizable set of possible stream types or, at the very least, a set that can change over time would be good when trying to think through risky assumptions in v1.

heff commented 1 year ago

Could this also be represented as:

StreamType:

DVRWindow:

I think it would be easier on the UI if you can build a general Live UI, without having to know all the live types.

cjpillsbury commented 1 year ago

I think this looks great as an alternative. Let's assume we'll move forward with this. A few callouts on the details:

cjpillsbury commented 1 year ago

Just to call this out explicitly (discussed out of band):

  1. With this new proposal, it's possible to have e.g. streamType="on-demand" && dvrWindow=30, which is "technically invalid".
  2. Since this is for Media UI Extensions, this may actually be a strength, allowing folks to treat on demand content "as if" it's e.g. a sliding window.
gkatsev commented 1 year ago

I do like having the two properties, since DVR window is a description of live to me. DVR Window could also be defined as only valid if the stream type is live, but might make sense to keep it loose and allow for vod to be treated as live/dvr depending on what's set.

Might make sense to have a "minimal" support and "maximal" support.

gkatsev commented 1 year ago

Thinking about it some more, I think that having the dvr window require a number a problem. Specifically, what if someone wants to play a sliding window DVR but doesn't know the window is because they offloaded all the video stuff to some service. I think there should be a way to say "I want this to have the DVR UI, but figure out the window from the content yourself"

cjpillsbury commented 1 year ago

Thinking about it some more, I think that having the dvr window require a number a problem. Specifically, what if someone wants to play a sliding window DVR but doesn't know the window is because they offloaded all the video stuff to some service. I think there should be a way to say "I want this to have the DVR UI, but figure out the window from the content yourself"

Would this be a problem if they relied on the "inferred" value use case described above?

gkatsev commented 1 year ago

I guess for me, it's the expectation of the type of UI that is being shown based on these configurations. For stream-type="on-demand" it should show the regular UI we're used to with a start time and duration. For stream-type="live", by default, it should show a simpler UI without a progress bar or other timings. But, I'd like to configure it with DVR, matching the HLS event type, where the UI looks most like an on-demand stream, where there is a progress bar and the times start at 0. In addition, I'd like a second DVR UI for a sliding window which shows like the last 30 seconds or 2 hours or whatever the stream is configured as, however, I don't want to know what the stream is set to.

heff commented 1 year ago

Let's use "on-demand"

👍

let's assume the "nil" cases are either...strict and always

I like strict, and should probably never be undefined unless actually not implemented. e.g. duration = NaN, srcObject=null. Then we can detect when this isn't implemented. streamType: null dvrWindow: NaN (assuming it's always a Number otherwise)

Properties are (can be?) inferred based on the media content but are overridable via a setter (aka not read only)

That feels like it could get complicated. At the UI layer (media-chrome) you could certainly decide to ignore the media element values, but setting the stream type on the media element itself is like saying "you think you're playing vod, but you're really playing live". It's open to interpretation how the media element should handle that.

Not sure these should be part of the media-ui-extensions formalization, but attributes could be either

Yeah, probably not part of media-ui-extensions since media elements don't push state out to attributes.

@gkatsev I'm following everything except "I want this to have the DVR UI, but figure out the window from the content yourself"... "however, I don't want to know what the stream is set to".

Are we talking about:

How would one "figure out the window from the content" if the media element isn't reporting that detail through a property like dvrWindow?

Finally, alt proposal for dvrWindow is liveWindow, for similar reasons to vod/on-demand.

gkatsev commented 1 year ago

I think I may have complicated things by not being extra clear in my thoughts, and also maybe not verifying the specific constraints on this proposal. Basically, there are two issues at hand:

  1. as a player developer, I want to know what the stream type is to display things accordingly.
  2. As a player user, I want to a specific UI for the media I'm putting on my site.

For 1, the stream type and live window stuff can generally be figured out from the underlying video data, like duration being Infinity means live and the live window is seekableEnd-seekableStart. For 2, we want to be able to provide this data from the outside. What I meant by "however, I don't want to know what the stream is set to" is that a player user may not know how a particular live stream is configured in terms of number of segments and segment durations and just wants to be able to configure the player to show a particular UI. Mux Player is such an example, because you can set stream-type and get the corresponding UI, regardless of what the video actually is.

Hopefully, that clarifies things.

heff commented 1 year ago

@gkatsev yep, thanks

like duration being Infinity means live and the live window is seekableEnd-seekableStart

Do we need this new API then? Media chrome, Mux Player, and other players can of course add some sugar to make working with different stream types easier, but for the sake of media elements specifically, do we already have what we need to determine stream type and the dvr window? Is seekable missing anything?

gkatsev commented 1 year ago

So, would every component need to check if duration is Infinity and what the seekable is before doing anything?

Also, with hls.js for example, the seekable end is slightly different from hls.js's liveSyncPosition (I'm not exactly sure why, but that's another matter). This could get pushed down into the slotted media element implementation.

Actually, this brings up the question: is this a property that's supposed to be exposed from the media element?

Maybe the solution to my dichotomy is that media-chrome should use the media element's provided stream-type unless media-controller was given a stream-type via an attr? Separating the two this way also makes it so that there isn't a concern about setting the property from inside, while also making it be settable from the outside.

heff commented 1 year ago

is this a property that's supposed to be exposed from the media element?

Yes. For context, this whole repo is about "Extending the HTMLVideoElement API". Any conversations about media chrome or how a player would use the API should only be to inform the media element API design. i.e. this isn't the forum to solve media chrome specific things, and if we're headed that route we should push it over to a media chrome issue.

would every component need to check if duration is Infinity and what the seekable is before doing anything

In the media chrome case, no. Only media-controller would check the media element's properties, and then it would translate it into stream type, etc for other components. I feel like that's fine. It's a whole other thing to say every [slotted] media element has to do that translation work and expose a new API for the result.

Also, with hls.js for example, the seekable end is slightly different from hls.js's liveSyncPosition (I'm not exactly sure why, but that's another matter). This could get pushed down into the slotted media element implementation.

Yeah, that's interesting. I think we'd expect custom media elements make their own seekable property match what's intended to be seekable for the dvr window, meaning not just pass through the native video element's seekable data if it's not quite right. Then it'd be good to understand if the native video element needs to fixed somehow, per browser, to support live windows better.

cjpillsbury commented 1 year ago

Since this is intended for media-ui-extensions, I'm hesitant to conflate data APIs with UI, as these can come apart (e.g. there may be needs to have a programmatic seekable that is a distinct value from liveWindow (given e.g. the way MediaSource or other non-src values can work). Similarly, even if we don't want this to be a part of the media-ui-extensions, I suspect we'll want/need setters for these values. For example, there is no guaranteed, in-spec inferable way with MPEG-DASH to distinguish between a small seekable window in live ("dynamic") content to avoid stalls/account for latency vs. "DVR"/"sliding window". Having these values be settable allows a developer to announce how they want the UI to be presented:

cjpillsbury commented 1 year ago

Just to wrap this up, I'm going to pin down what we have so far:

Stream Types

Interface

Inferred

HAS-Specific: HLS

(assumes a media playlist for the current src has been loaded at least once)

HAS-Specific: MPEG-DASH

(assumes the manifest MPD for the current src has been loaded at least once)

DVR

DVR will be modeled separately from streamType as a boolean.

Interface

Inferred DVR

TBD

Out of scope proposal: HTMLMediaElement::seekable.end(0) - HTMLMediaElement::seekable.start(0) >= DVR_WINDOW_SIZE where DVR_WINDOW_SIZE is some determined duration threshold sufficiently large to count as DVR (or "sliding window") and may potentially be configurable via a property or attribute

HAS-SPECIFIC: HLS

(assumes a media playlist for the current src has been loaded at least once)

HAS-SPECIFIC: MPEG-DASH

TBD

Out of scope proposal: MPD@timeShiftBufferDepth > DVR_WINDOW_SIZE where DVR_WINDOW_SIZE is some determined duration threshold sufficiently large to count as DVR (or "sliding window") and may potentially be configurable via a property or attribute

gkatsev commented 1 year ago

Valid Values: "live" | "on-demand" | null | undefined

why have both null and undefined here?

Inferred DVR

TBD

Wouldn't this be seekable.start(0) doesn't change?

heff commented 1 year ago

why have both null and undefined here?

undefined would essentially mean "unimplemented". That will probably be true for any new media-ui-extension. null means implemented but unknown.

gkatsev commented 1 year ago

Would it be better to have "unknown" be the unknown value rather than null to keep the type the same except for when there's no support for the feature? i.e., it'll make the type be string | undefined.

heff commented 1 year ago

It's worth noting that you can just rely on duration for stream type, but there's value in a specific streamType property, because of the async nature of duration. The player might know the stream type before duration is set, even from other metadata about the video.

Would it be better to have "unknown" be the unknown value rather than null

I can see that making sense and it's more direct. But null also matches "no attribute". All the aria props (strings) default to null. If it was a string "unknown" there might be temptation to sprout that value to an attribute? Although...this is probably property only, not an attribute, since it's not user configurable.

heff commented 1 year ago

DVR

@cjpillsbury thanks for writing this up. I'm still not totally clear on the reasons why this is needed from a media element in addition to seekable, and how it would be used. Is it basically "this media is meant to be accessible as DVR (have a progress bar), no matter what the seekable range might be right now"?

cjpillsbury commented 1 year ago

Is it basically "this media is meant to be accessible as DVR (have a progress bar), no matter what the seekable range might be right now"

Yeah, it solves that problem and makes it easier definitively disambiguate between live (which always has some seekable range) and dvr for e.g. hls.

cjpillsbury commented 1 year ago

@gkatsev @heff I propose we treat "sliding window" and its relationship to dvr as out of scope for this discussion.

cjpillsbury commented 1 year ago

@gkatsev @heff since there's still (out of band) discussion about "DVR" more generally, I propose we also treat DVR as out of scope for this discussion. I've written up a google doc discussing some of the complexities and considerations around DVR, available for comment here. I'll go ahead and start a separate github discussion specifically for DVR with a link to the google doc.

cjpillsbury commented 1 year ago

Assuming we descope all DVR from the Stream Types discussion, I believe we are close to finalizing this proposal for "live" and "on-demand". The only potential disagreement that remains is how we should represent the "unknown" case where streamType is supported by the media element. The two proposals here are:

  1. null
  2. "unknown"

I have a slight leaning toward (2) to make it explicit, though I'm amenable to either. @gkatsev @heff I will defer to whatever you two think is the better value.

Once this is decided, let me know if there is anything else outstanding to finalize this proposal. Otherwise, let's make this decision and finish our first media-ui-extensions proposal 🎉.

gkatsev commented 1 year ago

I would lean towards 2 "unknown" as well. Prior art:

heff commented 1 year ago

I'm good with unknown. In media-chrome I don't think we should follow that pattern for the attribute, and we'll stick with "no media-stream-type attribute" (getAttribute returns null) means unknown or unimplemented by the media. But that's a different problem space. If there's disagreement with that we can followup in the media chrome thread.