Closed markafoltz closed 6 years ago
I tagged this [Meta] since other issues might be forked off from it.
@foolip FYI One thing to note is we'd likely want the user agents to be consistent with the API behavior in the case of remote playback initiated by the user agent. That means that we should avoid breaking websites that are unaware of the Remote Playback API when the user agent initiates remote playback. For example, throwing exceptions or firing an error event for unsupported operations during remote playback could cause the website to stop playback thinking the local playback has been interrupted.
Discussed at the F2F: http://www.w3.org/2016/05/24-webscreens-minutes.html#item09
PROPOSED RESOLUTION: Extend the requirements doc as a start, best effort for UAs to reflect remote state locally otherwise.
So let's just list all of the things that one can do:
src
, call load()
or otherwise cause the current resource to be abandonedfastSeek
is implemented in WebKit)Which of these might be problematic on the remote side? Do we expect to have implementations where the volume can't be changed at all? Where changing the enabled audio track doesn't work?
The most troubling of these to me is actually text tracks. WebVTT is built on other web technologies, and if the remote isn't also a web engine, then it would have to be an independent implementation of WebVTT, and it's somewhat likely that just won't be done. @zcorpan
Being able to implement WebVTT without a Web engine was a design goal originally I believe, and such implementations exist, e.g. Submerge.
@mfoltzgoogle @avayvod, is Chromecast the only device planned for the implementation in Chrome, and would any of the things in my list be problematic?
Chromecast the only device planned for the implementation in Chrome
We plan on supporting Chromecast but may support other endpoints in the future.
would any of the things in my list be problematic
I believe Cast supports most of those features through their current Receiver SDK including text track support.
However I am not in the loop on current implementation status (i.e., are all features of WebVTT supported), I would have to loop in more folks on the Cast and media stack teams regarding WebVTT and fastSeek.
Can you also check about audio track support? The HTMLMediaElement
API for this allows enabling multiple audio tracks at once, but it's easy to imagine APIs/SDKs where only one audio track can be enabled at a time. (For video tracks, only one can be enabled.)
@foolip I think it's not supported, I couldn't find any info in the Cast API reference at least. Tracks are only mentioned in the context of WebVTT for closed captions.
This was partially addressed by #49 (w.r.t. local/remote state transitions I think), we could be more explicit about what must and should be supported.
In the spirit of "let's list what one can do".
This is just the main HTMLMediaElement
interface:
readonly attribute MediaError? error;
On error, remote playback is likely to disconnect. MUST be set when ondisconnect
is fired due to an error.
Should we expand error values for remote playback cases?
attribute DOMString src;
Setting |src| MUST try to load the corresponding media resource on the remote playback device. Can disconnect if |src| is not supported by it.
readonly attribute DOMString currentSrc;
MUST reflect what is being played on the remote playback device.
attribute DOMString? crossOrigin;
MAY support. Ignored if not supported.
readonly attribute unsigned short networkState;
MAY support. Reflected to the best knowledge of the user agent. Otherwise is always in HAVE_FUTURE_DATA. Should we have a special value for remote playback?
attribute DOMString preload;
MAY support.
readonly attribute TimeRanges buffered;
MAY support if the remote playback mode provides this info. Otherwise pretend all is buffered or have empty ranges?
void load();
MUST load the src on the remote playback device. Can result in an error and disconnect.
CanPlayTypeResult canPlayType(DOMString type);
MUST return probably
by default, implemented to the best knowledge of the user agent.
readonly attribute unsigned short readyState;
MUST return HAVE_ENOUGH_DATA, implemented to the best knowledge of the user agent.
readonly attribute boolean seeking;
MUST be implemented.
attribute double currentTime;
MUST be implemented.
void fastSeek(double time);
MAY be implemented.
readonly attribute unrestricted double duration;
MUST be implemented.
object getStartDate()
MAY be implemented. Returns NaN if not.
readonly attribute boolean paused;
MUST be implemented.
attribute double defaultPlaybackRate;
MAY support. By default, return 1.0 and ignore setters.
attribute double playbackRate;
MAY support. By default, return 1.0 and ignore setters.
`readonly attribute TimeRanges played;
MAY support.
readonly attribute TimeRanges seekable;
MAY support.
readonly attribute boolean ended;
MUST support.
attribute boolean autoplay;
MUST support.
attribute boolean loop;
MUST support.
Promise<void> play();
MUST support.
void pause();
MUST support.
attribute boolean controls;
MUST support. Agnostic to remote state.
attribute double volume;
MAY support.
attribute boolean muted;
MAY support.
attribute boolean defaultMuted;
MAY support.
readonly attribute AudioTrackList audioTracks;
MUST support. Return the first track if multiple tracks are not supported.
readonly attribute VideoTrackList videoTracks;
MUST support. Return the first track if multiple tracks are not supported.
readonly attribute TextTrackList textTracks;
MUST support. Return the first track if multiple tracks are not supported.
TextTrack addTextTrack(TextTrackKind kind, optional DOMString label = "", optional DOMString language = "");
MAY support. Returns null
if not supported.
Some other HTMLMediaElement extensions (EME, MSE, Audio Sinks):
attribute MediaProvider? srcObject
MUST support. Invokation of load
algorithm may fail if the source is not supported.
readonly attribute DOMString sinkId;
MAY support. By default returns an empty string.
Promise<void> setSinkId(DOMString sinkId);
MAY support. Rejects with NotSupportedError
.
readonly attribute MediaKeys mediaKeys;
MAY support. Return null
otherwise.
Promise setMediaKeys(MediaKeys? mediaKeys);
MAY support. Reject with NotSupporterError
.
attribute EventHandler onencrypted;
MAY support. Otherwise, no-op.
attribute EventHandler onwaitingforkey;
MAY support. Otherwise, no-op.
MediaStream captureStream();
MAY support. Otherwise, reject with NotSupportedError
.
HTMLVideoElement
attribute unsigned long width;
MUST support. Depends on representation (poster or just a black 300x150 rectangle).
attribute unsigned long height;
MUST support. Depends on representation (poster or just a black 300x150 rectangle).
readonly attribute unsigned long videoWidth;
MUST support. Fallback to width
if information is not available from the remote playback device.
readonly attribute unsigned long videoHeight;
MUST support. Fallback to height
if information is not available from the remote playback device.
attribute USVString poster;
MUST support.
attribute boolean playsInline;
MUST support. Returns true. Works for the element representation not the actual video played remotely.
Note, the width
and height
of the video element should rather be the last known width/height (with recommendations on what to render, like a scaled poster image and label indicating the remote playback device). See #46 and #48.
And last but not least, the events that can fire. The rule of thumb is whether the corresponding attributes like readyState
and networkState
are supported and can take the corresponding values.
loadstart
MAY be supported.
progress
MAY be supported.
suspend
MAY be supported.
abort
MAY be supported.
error
MUST be supported.
emptied
MAY be supported.
loadedmetadata
MAY be supported.
loadeddata
MAY be supported.
canplay
MAY be supported.
canplaythrough
MAY be supported.
playing
MUST be supported.
waiting
MAY be supported.
seeking
MUST be supported.
seeked
MUST be supported.
ended
MUST be supported.
durationchange
MUST be implemented.
timeupdate
MUST be implemented.
play
MUST be implemented.
pause
MUST be implemented.
ratechange
MAY be implemented.
resize
MAY be implemented.
volumechange
MAY be implemented.
requestFullscreen
MUST work but affect the local representation of the media element.
F2F feedback:
connected
state more than the disconnected
(a use case mentioned, for instance, is custom browsers that are not allowed to implement seeking due to content restrictions - such browsers won't be able to comply with the Remote Playback API spec if it mandates they MUST implement seeking).TBH, the spec for HTMLMediaElement
does say, that fastSeek()
MUST run the seek algorithm which has a strong definition of MUST run the steps. So I stand corrected and feel that the example given yesterday is not valid. Not clear how to avoid depending on the HTMLMediaElement
spec.
Remote Playback changes how HTMLMediaElement
behaves, to not spell out the details of how doesn't seem tractable. If you think describing it as a special mode in the HTML spec that your spec then flips the bit for, that's a possibility too.
F2F: group the features into what MUST work but may change the behavior, what MAY not work and how it behaves if it doesn't; only list these features in the spec assuming the rest work without a change.
F2F: state transition algorithm might be the trickiest ones to change (remote playback device might not provide as many states as HTMLMediaElement exposes to the page).
For reference, see minutes of the discussion at TPAC
Were there any work items from the TPAC discussion? It seems like we should make an effort to classify media element features into MUST, SHOULD and unspecified using the current shipping implementations as a baseline.
@mfoltzgoogle, the TPAC meeting minutes confirm that was the proposed plan:
https://www.w3.org/2016/09/23-webscreens-minutes.html#item02
This issue is a blocker for the CR publication tracked in #73 and based on my assessment this should be resolved to be able to identify possible "at risk" features. The process doc tells us such "at risk" features "may be removed before advancement to Proposed Recommendation without a requirement to publish a new Candidate Recommendation." so in practice we can avoid some back-and-forth movement if we identify such features upfront.
All - Contributions welcome!
IIRC, there were concerns about MUST for basic operations like seeking during the meeting as some remote playback devices might not be able to implement seeking and HTMLMediaElement doesn't really mandate it.
Could we avoid listing every feature of the media element by following the Presentation API example w/r/t the Web APIs available on the receiver in this note:
Given the operating context of the presentation display, some Web APIs will not work by design (for example, by requiring user input) or will be obsolete (for example, by attempting window management); the receiving user agent should be aware of this. Furthermore, any modal user interface will need to be handled carefully. The sandboxed modals flag is set on the receiving browsing context to prevent most of these operations.
?
I'm not sure that's relevant; that note is referring to Web APIs on the presentation receiver, not the controller. In my understanding of the Remote Playback API the controller is responsible for sending (or not sending) commands to the remote playback device. Of course it's possible that the device is implemented using HTML but it's not a requirement.
I meant just noting something like below could be sufficient:
"Given the capabilities of the remote playback device, some HTMLMediaElement APIs will not work by design or will be obsolete. In these case they MUST fallback to the same behavior as if the local playback device doesn't support these APIs (e.g. encryption, captions, multiple tracks, and so on)."
To be honest, the remote playback device capabilities might not be always a subset of those of the local playback device. The cases when something is not working locally but can work remotely might be worth looking into and adding a note about too.
I think that is okay, but one concern raised earlier is that there may not be specified behavior for mandatory features not implemented by the playback device. As you say this is also an issue for both local and remote playback, so the fix may be to address this in HTML5, but practically speaking I could see the potential for different interpretations.
For example, if muting is not supported, one UA may allow the attribute to be set but not propagate the command to the remote device, while another UA may ignore attempts to set the attribute. In either case content with custom controls may not correctly reflect the remote state depending on whether they recheck the attribute after setting and whether it accurately reflects the remote state.
Maybe the note could state that the properties of the media element should reflect as closely as possible the remote playback state, even if not all features are supported by the remote playback device; and events should not be fired unless they reflect actual changes to the remote playback state.
Second, one purpose of the Presentation API note was to give specific guidance as to what APIs are not expected to work on the presentation. Can the same be done for remote playback - I think you started a list above, can it be made more explicit?
I would be in favor of two separate notes as I think they convey different information.
Hearing no further comments, I'd ask the editors @avayvod @mounirlamouri to implement the synthesis of the latest proposals. Feel free to use your editorial freedom to mould the text to fit in the spec, but roughly:
Given the varying capabilities of the remote playback devices, some
HTMLMediaElement
APIs will not work by design or will be obsolete. In these cases they are expected to fallback to the same behavior as if the local playback device would not support these APIs. Examples of such features include encryption, captions, multiple tracks, and so on.The properties of the
HTMLMediaElement
are expected to reflect as closely as possible the remote playback state, even if not all features are supported by the remote playback device; and events should not be fired unless they reflect actual changes to the remote playback state."
HTMLMediaElement
properties into two buckets: properties that MAY and MUST behave as specified also on the remote playback device per the list documented earlier in this issue. I suggest use a concise form over an actual list:The following
HTMLMediaElement
properties MUST behave as defined in [HTML] on the remote playback device: X, Y, Z".
Listing MUSTs and MAYs is a start, and optimally we'd add normative language to define expected behaviour in the case of "not supported" for each MAY feature, as to allow web developers feature detect such cases in an interoperable manner across implementations.
I opened #88 to discuss the case where the remote playback device capabilities might not always be a subset of those of the local playback device.
@avayvod @mounirlamouri @mfoltzgoogle, any concerns with the proposal I outlined above? If none, could you please address this remaining issue so we could get to zarro boogs for CR tracked in #73.
If the proposal is lacking, I'd be happy if you could synthesize an improved proposal for review.
I'm a bit concerned about the lack of feedback here. Are folks already out of office?
I was traveling for the past few days. Happy to have a look but I think @avayvod has more context than me on this issue as he looked into it in the past.
I consider the PR I uploaded to be the minimum needed to close this issue.
Regarding other aspects:
I'm not sure if I have a good grasp of what "X, Y, and Z" MUST be implemented by all remote playback devices. That would require understanding better the constraints of current and future implementations, and sounds like specifying a remote playback device itself, which may not be in scope of this spec. Obviously devices that don't support basic commands like pause, mute, etc. are very bad implementations, but not confident enough to specify what is "bad" at this point. Let me think about it, but not sure it should block going to CR.
As far as feature detection of supported capabilities of the remote device, I could see this being very useful, for example for a player library that wants to support remote playback on multiple devices with different capabilities. My thinking is adding capability detection would a useful extension to the Media Capabilities API based on implementation experience and developer feedback. Again not blocking CR.
@avayvod Are you satisfied with the current language around remote playback device capabilities, or do you think more is needed at this point? Basically, we are saying that the browser shouldn't lie about the state of remote playback, but not mandating that the remote playback device implement specific playback features.
The note is good. I still think that the spec could be clearer about what happens or does not happen during transition.
In particular, what happens to the videoTracks
, audioTracks
and textTracks
properties? Do the lists disappear? If so, do change
and removetrack
events get fired? Can the local user agent continue to manage text tracks locally during remoting and fire cues accordingly?
That may not warrant more normative text though. Perhaps it all fits within a Note or example that could explain in substance:
pause
event. Transition will be as seamless as possible from an app perspective)audioTracks
, videoTracks
and textTracks
might become empty. Same thing for buffered
and seekable
.In particular, what happens to the videoTracks, audioTracks and textTracks properties? Do the lists disappear? If so, do change and removetrack events get fired? Can the local user agent continue to manage text tracks locally during remoting and fire cues accordingly?
I suppose all of these are possible; is this question in reference to a specific remote playback implementation?
what will never happen during a transition (for instance, even though there is a note that says that local playback should be paused, we don't expect the user agent to fire a pause event. Transition will be as seamless as possible from an app perspective)
I believe this is implied by the note - since playback continues on the remote playback device, there is no Web-visible transition to paused. I can add a sentence to the existing note to make this explicit.
what could happen depending on remote playback capabilities and what that means in terms of events, for instance the fact that audioTracks, videoTracks and textTracks might become empty. Same thing for buffered and seekable.
I'm not sure about removing tracks if the remote playback device does not support them. They are still available in the underlying media source, it's just that they may not be playable in the current context.
In particular, what happens to the videoTracks, audioTracks and textTracks properties? Do the lists disappear? If so, do change and removetrack events get fired? Can the local user agent continue to manage text tracks locally during remoting and fire cues accordingly?
I suppose all of these are possible; is this question in reference to a specific remote playback implementation?
No. I'm wondering what needs to be made explicit in the spec to guarantee interoperability between implementations.
Taking a concrete example, let's say that my app plays a video, has a pointer to a TextTrack
instance for a text track within that video stream, and follows cuechange
events on that instance to render something on screen. That app might break if these events are no longer triggered after the user activates remote playback.
How do I detect that the TextTrack
instance I have is no longer valid? With regular media playback, I believe I would receive a removetrack
event on the TextTrackList
instance attached to the media element. Will I receive the same event if the user activates remote playback and the text track becomes no longer available locally? I suppose so but it may be worth making that explicit in the spec, especially because we want to "hide" other aspects of the transition (such as local pausing).
Now, I may be creating issues where they don't exist, and we may want to get more implementation and usage experience before we make things more explicit in the spec, so as to understand what can concretely trigger interoperability issues. In other words, current text is probably good enough for now, we can add more notes afterwards as needed.
I agree there are potential issues with track compatibility, but I don't think we yet have enough information to resolve them concretely. It depends on developer feedback and implementation experience. I can provide some insight into the latter based on what Chrome has shipped, but not sure when I can get to it.
If the concern is interoperability, then there's a fairly small set of implementations we would be extrapolating future interoperability from. Maybe that's the best we can do at this time.
I'm pretty happy with the added note, thanks @mfoltzgoogle! I amended it a bit in #97.
@tidoust, do you think it'd be appropriate to advance to CR with the current text if we'd clarify the current status in https://w3c.github.io/remote-playback/#status-of-this-document as follows (feel free to amend):
Issue #41 discusses the set of media playback features that remote playback devices are expected to support. The group will seek further developer feedback and implementation experience to identify any interoperability issues around these features when used during remote playback, and will further clarify the specification based on feedback received.
I believe that's fine, @anssiko. The text sets expectations quite nicely, that's good!
From https://www.w3.org/2017/11/06-webscreens-minutes.html#x03:
ACTION: @mfoltzgoogle to add normative language to the spec around local playback state to address issue #41
This issue was noted in the Candidate Recommendation as the only remaining substantial open issue. Now this issue has been addressed by https://github.com/w3c/remote-playback/commit/e1da4869689f1a61e29754850cd266f49ccd070e. Thanks @mfoltzgoogle for your contribution.
As noted in the spec, we are seeking further developer feedback and implementation experience to identify any interoperability issues around the features discussed in #41, and now in particular for the newly updated Media commands and media playback state section.
In remoting mode (i.e. state ==
connected
) any side effects on the media element, for example mutations to properties, invocations of methods, or detachment from the DOM may (or may not) affect remote playback.Because the behavior of the remote playback device seems to be out of scope for this spec, there may not be much to say in the normative sections of the spec.
However, in my opinion it would be a better spec to at least say something in regards to what should happen. I can see these behaviors falling into three categories:
The challenge will be in cases where the observable state of the element might be affected by implementation choices. For example, when playing back on a remote device that does not support changing the playback volume, how should the element behave when its
volume
attribute is set?