Open richtr opened 9 years ago
This is and interesting proposal, but I'm worried about having different privileges for different types - I fear everyone would just lie or screw it up (and browsers would be forced to treat everything as "default"). Also, these seem to be outside the use cases we've been discussing - and seem to leap native, which we've stated a few times should not be a goal (only parity). My opinion is that we should stick to just controlling simple audio/video for now.
I'd like to see us more fully explore MediaSession
before we talk about anything else.
I fear everyone would just lie or screw it up (and browsers would be forced to treat everything as "default").
If each media type has specific behavior then web developers can choose the most appropriate based on the behavior they want. The trick is to find a suitable set of categories that match the experience users would expect which seems different for e.g. alarms vs. music vs. notifications vs. webrtc-based audio/video.
Also, these seem to be outside the use cases we've been discussing - and seem to leap native, which we've stated a few times should not be a goal (only parity).
This seems to be consistent with native capabilities on iOS (see: audio session modes) and Android (see: [streamType
](https://developer.android.com/reference/android/media/AudioManager.html#requestAudioFocus%28android.media.AudioManager.OnAudioFocusChangeListener, int, int%29)). Each of these APIs allow media to be identified as belonging to a particular class of usage.
I also wonder if we are missing a couple of use cases. With ServiceWorkers I could set a wake-up alarm. How would that be displayed in the notification tray? Another use case would be that the user may typically expect all currently playing out media to pause once they join a WebRTC call. Is it worth documenting and addressing these use cases too?
My opinion is that we should stick to just controlling simple audio/video for now.
Happy to do that. Though we lose some subtlety in the interaction between web apps and different types of media usage.
I'd like to see us more fully explore
MediaSession
before we talk about anything else.
Sounds good. Given current platform limitations having something tied to observable platform media feels like the baseline for any solution right now.
I think we probably ought to be able to distinguish between different kinds of audio, to get parity with native platforms. In particular ducking for notifications seems impossible to achieve otherwise, without heuristics based on media duration or something. In the MediaSession
proposal, the obvious place to put this is on MediaSession
itself.
@foolip, @richt, can you articulate the use case for the distinction a bit more abstractly (without the API proposal) and send a PR to the whatwg repo's README.md describing why it's needed? Giving concrete examples of how iOS and Android use this would be extremely helpful (as well as how web pages would make use of this in practice).
err, @foolip I mean (fixed typo above, sorry)
https://github.com/whatwg/media-keys/pull/4 has some uses cases which could be solved by distinguishing between at least "normal", "notification" and "voice" kinds.
I've updated this issue's original description with more details of how it works.
There is also some precedent in the platform for this approach. The HTMLTrackElement
interface takes a .kind
attribute, limited to only known values, in the same way this issue proposes for HTMLMediaElement
.
I haven't seen any proposal on how MediaSession
will handle this yet. Is there any further input on this?
HTMLMediaElement can be used to play out various types of media such as music, notification sounds, WebRTC streams and alarms. If we could differentiate between these different types of media content then user agents would be able to a.) provide contextual remote control access depending on the 'type' of media playing out and b.) enforce interactions between different media content (e.g. ducking music when a notification sound plays out or pausing all other media when a WebRTC voice call begins).
HTML media elements should be able to describe the 'intent' of their media content with the following API addition:
The default
kind
value is an empty string. Thekind
attribute is limited to only known values ofHTMLMediaKind
.A media element's
kind
content attribute can be declared in HTML:or a media element's
kind
attribute can be set in JavaScript:Any interaction between different kinds of media content can then be handled by each user agent on a platform-by-platform basis. For example, a desktop browser may define interactions between the different types of media elements according to the following table:
(This table must always read from the left column first. For example:
'notification →'
'ducks'
'music ↑'
should be read as: "when a newnotification
media element reaches theplaying
state the user agent shouldduck
allmusic
media elements currently in aplaying
state.")pauses
pauses
pauses
ducks
ducks
ducks
pauses
pauses
ducks
ducks
ducks
ducks
ducks
ducks
ducks
pauses
pauses
ducks
pauses
pauses
pauses
pauses
By introducing
.kind
, we get the following capabilities:.remoteControls
attribute. We would implicitly provide the most suitable remote controls and interaction based only on the kind of media currently playing out. For example, 'notification', 'dtmf' and 'ring' media elements would not obtain remote control or soft key interface access but they should still play nicely with other media content types. For 'alarm' we would be able to provide e.g. an alarm-type remote control interface. 'music' would be provided with e.g. a music-type remote control interface.Media categorization could also be applied to
AudioContext
andMediaController
objects as well asMediaSession
.Pinging @foolip, @jernoble, @marcoscaceres for their comments.