[Meta] Guidance for HTMLMediaElement, HTMLAudioElement, HTMLVideoElement behaviors during remoting

markafoltz commented 8 years ago

In remoting mode (i.e. state == connected) any side effects on the media element, for example mutations to properties, invocations of methods, or detachment from the DOM may (or may not) affect remote playback.

Because the behavior of the remote playback device seems to be out of scope for this spec, there may not be much to say in the normative sections of the spec.

However, in my opinion it would be a better spec to at least say something in regards to what should happen. I can see these behaviors falling into three categories:

Must-implement for any reasonable experience: e.g, pause, mute, stop
No-op as they may not make any sense: e.g., setting autoplay or preload during remote playback
Implementation choice based on the device capabilities and desired UX.

The challenge will be in cases where the observable state of the element might be affected by implementation choices. For example, when playing back on a remote device that does not support changing the playback volume, how should the element behave when its volume attribute is set?

markafoltz commented 8 years ago

I tagged this [Meta] since other issues might be forked off from it.

avayvod commented 8 years ago

@foolip FYI One thing to note is we'd likely want the user agents to be consistent with the API behavior in the case of remote playback initiated by the user agent. That means that we should avoid breaking websites that are unaware of the Remote Playback API when the user agent initiates remote playback. For example, throwing exceptions or firing an error event for unsupported operations during remote playback could cause the website to stop playback thinking the local playback has been interrupted.

tidoust commented 8 years ago

Discussed at the F2F: http://www.w3.org/2016/05/24-webscreens-minutes.html#item09

PROPOSED RESOLUTION: Extend the requirements doc as a start, best effort for UAs to reflect remote state locally otherwise.

foolip commented 8 years ago

So let's just list all of the things that one can do:

Set src, call load() or otherwise cause the current resource to be abandoned
Seek (fast or accurate, fastSeek is implemented in WebKit)
Pause/play
Change playback rate
Change volume and mute
Change enabled audio/video tracks
Change enabled text tracks

Which of these might be problematic on the remote side? Do we expect to have implementations where the volume can't be changed at all? Where changing the enabled audio track doesn't work?

The most troubling of these to me is actually text tracks. WebVTT is built on other web technologies, and if the remote isn't also a web engine, then it would have to be an independent implementation of WebVTT, and it's somewhat likely that just won't be done. @zcorpan

zcorpan commented 8 years ago

Being able to implement WebVTT without a Web engine was a design goal originally I believe, and such implementations exist, e.g. Submerge.

foolip commented 8 years ago

@mfoltzgoogle @avayvod, is Chromecast the only device planned for the implementation in Chrome, and would any of the things in my list be problematic?

markafoltz commented 8 years ago

Chromecast the only device planned for the implementation in Chrome

We plan on supporting Chromecast but may support other endpoints in the future.

would any of the things in my list be problematic

I believe Cast supports most of those features through their current Receiver SDK including text track support.

However I am not in the loop on current implementation status (i.e., are all features of WebVTT supported), I would have to loop in more folks on the Cast and media stack teams regarding WebVTT and fastSeek.

foolip commented 8 years ago

Can you also check about audio track support? The HTMLMediaElement API for this allows enabling multiple audio tracks at once, but it's easy to imagine APIs/SDKs where only one audio track can be enabled at a time. (For video tracks, only one can be enabled.)

avayvod commented 8 years ago

@foolip I think it's not supported, I couldn't find any info in the Cast API reference at least. Tracks are only mentioned in the context of WebVTT for closed captions.

avayvod commented 8 years ago

This was partially addressed by #49 (w.r.t. local/remote state transitions I think), we could be more explicit about what must and should be supported.

avayvod commented 8 years ago

In the spirit of "let's list what one can do".

This is just the main HTMLMediaElement interface:

readonly attribute MediaError? error;

On error, remote playback is likely to disconnect. MUST be set when ondisconnect is fired due to an error. Should we expand error values for remote playback cases?

attribute DOMString src;

Setting |src| MUST try to load the corresponding media resource on the remote playback device. Can disconnect if |src| is not supported by it.

readonly attribute DOMString currentSrc; MUST reflect what is being played on the remote playback device.

attribute DOMString? crossOrigin;

MAY support. Ignored if not supported.

readonly attribute unsigned short networkState;

MAY support. Reflected to the best knowledge of the user agent. Otherwise is always in HAVE_FUTURE_DATA. Should we have a special value for remote playback?

attribute DOMString preload;

MAY support.

readonly attribute TimeRanges buffered;

MAY support if the remote playback mode provides this info. Otherwise pretend all is buffered or have empty ranges?

void load();

MUST load the src on the remote playback device. Can result in an error and disconnect.

CanPlayTypeResult canPlayType(DOMString type);

MUST return probably by default, implemented to the best knowledge of the user agent.

readonly attribute unsigned short readyState;

MUST return HAVE_ENOUGH_DATA, implemented to the best knowledge of the user agent.

readonly attribute boolean seeking;

MUST be implemented.

attribute double currentTime;

MUST be implemented.

void fastSeek(double time);

MAY be implemented.

readonly attribute unrestricted double duration;

MUST be implemented.

object getStartDate()

MAY be implemented. Returns NaN if not.

readonly attribute boolean paused;

MUST be implemented.

attribute double defaultPlaybackRate;

MAY support. By default, return 1.0 and ignore setters.

attribute double playbackRate;

MAY support. By default, return 1.0 and ignore setters.

`readonly attribute TimeRanges played;

MAY support.

readonly attribute TimeRanges seekable;

MAY support.

readonly attribute boolean ended;

MUST support.

attribute boolean autoplay;

MUST support.

attribute boolean loop;

MUST support.

Promise<void> play();

MUST support.

void pause();

MUST support.

attribute boolean controls;

MUST support. Agnostic to remote state.

attribute double volume;

MAY support.

attribute boolean muted;

MAY support.

attribute boolean defaultMuted;

MAY support.

readonly attribute AudioTrackList audioTracks;

MUST support. Return the first track if multiple tracks are not supported.

readonly attribute VideoTrackList videoTracks;

MUST support. Return the first track if multiple tracks are not supported.

readonly attribute TextTrackList textTracks;

MUST support. Return the first track if multiple tracks are not supported.

TextTrack addTextTrack(TextTrackKind kind, optional DOMString label = "", optional DOMString language = "");

MAY support. Returns null if not supported.

avayvod commented 8 years ago

Some other HTMLMediaElement extensions (EME, MSE, Audio Sinks):

attribute MediaProvider? srcObject

MUST support. Invokation of load algorithm may fail if the source is not supported.

readonly attribute DOMString sinkId;

MAY support. By default returns an empty string.

Promise<void> setSinkId(DOMString sinkId);

MAY support. Rejects with NotSupportedError.

readonly attribute MediaKeys mediaKeys;

MAY support. Return null otherwise.

Promise setMediaKeys(MediaKeys? mediaKeys);

MAY support. Reject with NotSupporterError.

attribute EventHandler onencrypted;

MAY support. Otherwise, no-op.

attribute EventHandler onwaitingforkey;

MAY support. Otherwise, no-op.

MediaStream captureStream();

MAY support. Otherwise, reject with NotSupportedError.

avayvod commented 8 years ago

HTMLVideoElement

attribute unsigned long width;

MUST support. Depends on representation (poster or just a black 300x150 rectangle).

attribute unsigned long height;

MUST support. Depends on representation (poster or just a black 300x150 rectangle).

readonly attribute unsigned long videoWidth;

MUST support. Fallback to width if information is not available from the remote playback device.

readonly attribute unsigned long videoHeight;

MUST support. Fallback to height if information is not available from the remote playback device.

attribute USVString poster;

MUST support.

attribute boolean playsInline;

MUST support. Returns true. Works for the element representation not the actual video played remotely.

avayvod commented 8 years ago

Note, the width and height of the video element should rather be the last known width/height (with recommendations on what to render, like a scaled poster image and label indicating the remote playback device). See #46 and #48.

avayvod commented 8 years ago

And last but not least, the events that can fire. The rule of thumb is whether the corresponding attributes like readyState and networkState are supported and can take the corresponding values.

loadstart

MAY be supported.

progress

MAY be supported.

suspend

MAY be supported.

abort

MAY be supported.

error

MUST be supported.

emptied

MAY be supported.

loadedmetadata

MAY be supported.

loadeddata

MAY be supported.

canplay

MAY be supported.

canplaythrough

MAY be supported.

playing

MUST be supported.

waiting

MAY be supported.

seeking

MUST be supported.

seeked

MUST be supported.

ended

MUST be supported.

durationchange

MUST be implemented.

timeupdate

MUST be implemented.

play

MUST be implemented.

pause

MUST be implemented.

ratechange

MAY be implemented.

resize

MAY be implemented.

volumechange

MAY be implemented.

avayvod commented 8 years ago

requestFullscreen MUST work but affect the local representation of the media element.

avayvod commented 8 years ago

F2F feedback:

it's not great to copy another spec (HTMLMediaElement) in our spec
since the HTMLMediaElement doesn't seem to have MUST for most methods, we shouldn't restrict the connected state more than the disconnected (a use case mentioned, for instance, is custom browsers that are not allowed to implement seeking due to content restrictions - such browsers won't be able to comply with the Remote Playback API spec if it mandates they MUST implement seeking).

avayvod commented 8 years ago

TBH, the spec for HTMLMediaElement does say, that fastSeek() MUST run the seek algorithm which has a strong definition of MUST run the steps. So I stand corrected and feel that the example given yesterday is not valid. Not clear how to avoid depending on the HTMLMediaElement spec.

foolip commented 8 years ago

Remote Playback changes how HTMLMediaElement behaves, to not spell out the details of how doesn't seem tractable. If you think describing it as a special mode in the HTML spec that your spec then flips the bit for, that's a possibility too.

avayvod commented 8 years ago

F2F: group the features into what MUST work but may change the behavior, what MAY not work and how it behaves if it doesn't; only list these features in the spec assuming the rest work without a change.

avayvod commented 8 years ago

F2F: state transition algorithm might be the trickiest ones to change (remote playback device might not provide as many states as HTMLMediaElement exposes to the page).

tidoust commented 8 years ago

For reference, see minutes of the discussion at TPAC

markafoltz commented 7 years ago

Were there any work items from the TPAC discussion? It seems like we should make an effort to classify media element features into MUST, SHOULD and unspecified using the current shipping implementations as a baseline.

anssiko commented 7 years ago

@mfoltzgoogle, the TPAC meeting minutes confirm that was the proposed plan:

https://www.w3.org/2016/09/23-webscreens-minutes.html#item02

This issue is a blocker for the CR publication tracked in #73 and based on my assessment this should be resolved to be able to identify possible "at risk" features. The process doc tells us such "at risk" features "may be removed before advancement to Proposed Recommendation without a requirement to publish a new Candidate Recommendation." so in practice we can avoid some back-and-forth movement if we identify such features upfront.

All - Contributions welcome!

avayvod commented 7 years ago

IIRC, there were concerns about MUST for basic operations like seeking during the meeting as some remote playback devices might not be able to implement seeking and HTMLMediaElement doesn't really mandate it.

Could we avoid listing every feature of the media element by following the Presentation API example w/r/t the Web APIs available on the receiver in this note:

Given the operating context of the presentation display, some Web APIs will not work by design (for example, by requiring user input) or will be obsolete (for example, by attempting window management); the receiving user agent should be aware of this. Furthermore, any modal user interface will need to be handled carefully. The sandboxed modals flag is set on the receiving browsing context to prevent most of these operations.

?

markafoltz commented 7 years ago

I'm not sure that's relevant; that note is referring to Web APIs on the presentation receiver, not the controller. In my understanding of the Remote Playback API the controller is responsible for sending (or not sending) commands to the remote playback device. Of course it's possible that the device is implemented using HTML but it's not a requirement.

avayvod commented 7 years ago

I meant just noting something like below could be sufficient:

"Given the capabilities of the remote playback device, some HTMLMediaElement APIs will not work by design or will be obsolete. In these case they MUST fallback to the same behavior as if the local playback device doesn't support these APIs (e.g. encryption, captions, multiple tracks, and so on)."

To be honest, the remote playback device capabilities might not be always a subset of those of the local playback device. The cases when something is not working locally but can work remotely might be worth looking into and adding a note about too.

markafoltz commented 7 years ago

I think that is okay, but one concern raised earlier is that there may not be specified behavior for mandatory features not implemented by the playback device. As you say this is also an issue for both local and remote playback, so the fix may be to address this in HTML5, but practically speaking I could see the potential for different interpretations.

For example, if muting is not supported, one UA may allow the attribute to be set but not propagate the command to the remote device, while another UA may ignore attempts to set the attribute. In either case content with custom controls may not correctly reflect the remote state depending on whether they recheck the attribute after setting and whether it accurately reflects the remote state.

Maybe the note could state that the properties of the media element should reflect as closely as possible the remote playback state, even if not all features are supported by the remote playback device; and events should not be fired unless they reflect actual changes to the remote playback state.

Second, one purpose of the Presentation API note was to give specific guidance as to what APIs are not expected to work on the presentation. Can the same be done for remote playback - I think you started a list above, can it be made more explicit?

I would be in favor of two separate notes as I think they convey different information.

anssiko commented 7 years ago

Hearing no further comments, I'd ask the editors @avayvod @mounirlamouri to implement the synthesis of the latest proposals. Feel free to use your editorial freedom to mould the text to fit in the spec, but roughly:

Add the following informative notes (I replaced normative RFC 2119 terms with their informative equivalents, some editorial):

Given the varying capabilities of the remote playback devices, some HTMLMediaElement APIs will not work by design or will be obsolete. In these cases they are expected to fallback to the same behavior as if the local playback device would not support these APIs. Examples of such features include encryption, captions, multiple tracks, and so on.

The properties of the HTMLMediaElement are expected to reflect as closely as possible the remote playback state, even if not all features are supported by the remote playback device; and events should not be fired unless they reflect actual changes to the remote playback state."

Classify the HTMLMediaElement properties into two buckets: properties that MAY and MUST behave as specified also on the remote playback device per the list documented earlier in this issue. I suggest use a concise form over an actual list:

The following HTMLMediaElement properties MUST behave as defined in [HTML] on the remote playback device: X, Y, Z".

Listing MUSTs and MAYs is a start, and optimally we'd add normative language to define expected behaviour in the case of "not supported" for each MAY feature, as to allow web developers feature detect such cases in an interoperable manner across implementations.

I opened #88 to discuss the case where the remote playback device capabilities might not always be a subset of those of the local playback device.

anssiko commented 7 years ago

@avayvod @mounirlamouri @mfoltzgoogle, any concerns with the proposal I outlined above? If none, could you please address this remaining issue so we could get to zarro boogs for CR tracked in #73.

If the proposal is lacking, I'd be happy if you could synthesize an improved proposal for review.

anssiko commented 7 years ago

I'm a bit concerned about the lack of feedback here. Are folks already out of office?

mounirlamouri commented 7 years ago

I was traveling for the past few days. Happy to have a look but I think @avayvod has more context than me on this issue as he looked into it in the past.

markafoltz commented 7 years ago

I consider the PR I uploaded to be the minimum needed to close this issue.

Regarding other aspects:

I'm not sure if I have a good grasp of what "X, Y, and Z" MUST be implemented by all remote playback devices. That would require understanding better the constraints of current and future implementations, and sounds like specifying a remote playback device itself, which may not be in scope of this spec. Obviously devices that don't support basic commands like pause, mute, etc. are very bad implementations, but not confident enough to specify what is "bad" at this point. Let me think about it, but not sure it should block going to CR.

As far as feature detection of supported capabilities of the remote device, I could see this being very useful, for example for a player library that wants to support remote playback on multiple devices with different capabilities. My thinking is adding capability detection would a useful extension to the Media Capabilities API based on implementation experience and developer feedback. Again not blocking CR.

markafoltz commented 7 years ago

@avayvod Are you satisfied with the current language around remote playback device capabilities, or do you think more is needed at this point? Basically, we are saying that the browser shouldn't lie about the state of remote playback, but not mandating that the remote playback device implement specific playback features.

tidoust commented 7 years ago

The note is good. I still think that the spec could be clearer about what happens or does not happen during transition.

In particular, what happens to the videoTracks, audioTracks and textTracks properties? Do the lists disappear? If so, do change and removetrack events get fired? Can the local user agent continue to manage text tracks locally during remoting and fire cues accordingly?

That may not warrant more normative text though. Perhaps it all fits within a Note or example that could explain in substance:

what will never happen during a transition (for instance, even though there is a note that says that local playback should be paused, we don't expect the user agent to fire a pause event. Transition will be as seamless as possible from an app perspective)
what could happen depending on remote playback capabilities and what that means in terms of events, for instance the fact that audioTracks, videoTracks and textTracks might become empty. Same thing for buffered and seekable.

markafoltz commented 7 years ago

In particular, what happens to the videoTracks, audioTracks and textTracks properties? Do the lists disappear? If so, do change and removetrack events get fired? Can the local user agent continue to manage text tracks locally during remoting and fire cues accordingly?

I suppose all of these are possible; is this question in reference to a specific remote playback implementation?

what will never happen during a transition (for instance, even though there is a note that says that local playback should be paused, we don't expect the user agent to fire a pause event. Transition will be as seamless as possible from an app perspective)

I believe this is implied by the note - since playback continues on the remote playback device, there is no Web-visible transition to paused. I can add a sentence to the existing note to make this explicit.

what could happen depending on remote playback capabilities and what that means in terms of events, for instance the fact that audioTracks, videoTracks and textTracks might become empty. Same thing for buffered and seekable.

I'm not sure about removing tracks if the remote playback device does not support them. They are still available in the underlying media source, it's just that they may not be playable in the current context.

tidoust commented 7 years ago

In particular, what happens to the videoTracks, audioTracks and textTracks properties? Do the lists disappear? If so, do change and removetrack events get fired? Can the local user agent continue to manage text tracks locally during remoting and fire cues accordingly?

I suppose all of these are possible; is this question in reference to a specific remote playback implementation?

No. I'm wondering what needs to be made explicit in the spec to guarantee interoperability between implementations.

Taking a concrete example, let's say that my app plays a video, has a pointer to a TextTrack instance for a text track within that video stream, and follows cuechange events on that instance to render something on screen. That app might break if these events are no longer triggered after the user activates remote playback.

How do I detect that the TextTrack instance I have is no longer valid? With regular media playback, I believe I would receive a removetrack event on the TextTrackList instance attached to the media element. Will I receive the same event if the user activates remote playback and the text track becomes no longer available locally? I suppose so but it may be worth making that explicit in the spec, especially because we want to "hide" other aspects of the transition (such as local pausing).

Now, I may be creating issues where they don't exist, and we may want to get more implementation and usage experience before we make things more explicit in the spec, so as to understand what can concretely trigger interoperability issues. In other words, current text is probably good enough for now, we can add more notes afterwards as needed.

markafoltz commented 7 years ago

I agree there are potential issues with track compatibility, but I don't think we yet have enough information to resolve them concretely. It depends on developer feedback and implementation experience. I can provide some insight into the latter based on what Chrome has shipped, but not sure when I can get to it.

If the concern is interoperability, then there's a fairly small set of implementations we would be extrapolating future interoperability from. Maybe that's the best we can do at this time.

anssiko commented 7 years ago

I'm pretty happy with the added note, thanks @mfoltzgoogle! I amended it a bit in #97.

@tidoust, do you think it'd be appropriate to advance to CR with the current text if we'd clarify the current status in https://w3c.github.io/remote-playback/#status-of-this-document as follows (feel free to amend):

Issue #41 discusses the set of media playback features that remote playback devices are expected to support. The group will seek further developer feedback and implementation experience to identify any interoperability issues around these features when used during remote playback, and will further clarify the specification based on feedback received.

tidoust commented 7 years ago

I believe that's fine, @anssiko. The text sets expectations quite nicely, that's good!

markafoltz commented 6 years ago

From https://www.w3.org/2017/11/06-webscreens-minutes.html#x03:

ACTION: @mfoltzgoogle to add normative language to the spec around local playback state to address issue #41

anssiko commented 6 years ago

This issue was noted in the Candidate Recommendation as the only remaining substantial open issue. Now this issue has been addressed by https://github.com/w3c/remote-playback/commit/e1da4869689f1a61e29754850cd266f49ccd070e. Thanks @mfoltzgoogle for your contribution.

As noted in the spec, we are seeking further developer feedback and implementation experience to identify any interoperability issues around the features discussed in #41, and now in particular for the newly updated Media commands and media playback state section.

w3c / remote-playback

[Meta] Guidance for HTMLMediaElement, HTMLAudioElement, HTMLVideoElement behaviors during remoting #41