Closed mcomella closed 6 years ago
An additional link: https://developer.amazon.com/docs/fire-tv/mediasession-api-integration.html
Looking briefly into this, it looks like we need to instantiate a MediaSessionCompat
instance and add some callbacks. However, WebView doesn’t appear to have APIs to query/modify audio/video playback so we might need to do this through JavaScript.
fwiw, there is a MediaSession
JS API which makes this a little harder to google. :)
The code we write could be usable by others trying to integrate MediaSession
with WebView
: we should consider making it a library.
I'd estimate this is a size M though it could turn into a size L if the JS turns out to be difficult/fragile.
I think this is as straight-forward as creating a MediaSession
and registering callbacks. However, one unclear thing: when do we need to call release
? The docs seem to assume that media Activities will playback for their whole lifecycle (thus in onDestroy
) but videos can be created on each page: should we create a new session for each page? Here's the code I wrote in onCreate to test without a device:
mediaSession = MediaSessionCompat(this, "lol")
val pb = PlaybackStateCompat.Builder()
.setActions(PlaybackStateCompat.ACTION_PLAY or
PlaybackStateCompat.ACTION_PAUSE)
.build()
mediaSession.setPlaybackState(pb)
mediaSession.setCallback(object : MediaSessionCompat.Callback() {
val browserFragment get() = supportFragmentManager.findFragmentByTag(BrowserFragment.FRAGMENT_TAG) as BrowserFragment?
override fun onPlay() {
Log.d("lol", "play called")
(browserFragment?.webView as FirefoxAmazonWebView?)?.evalJS("var vid = document.getElementsByTagName('video')[0]; vid.play();")
}
override fun onStop() {
Log.d("lol", "pause called")
(browserFragment?.webView as FirefoxAmazonWebView?)?.evalJS("var vid = document.getElementsByTagName('video')[0]; vid.pause();")
}
})
val contr = MediaControllerCompat(this, mediaSession)
MediaControllerCompat.setMediaController(this, contr)
// Test code to call callbacks
contr.transportControls.play()
Related notes:
"Other sites" is being tested with this site: https://www.w3.org/2010/05/video/mediaevents.html
"Alexa, play" doesn't seem to work consistently on youtube.com/tv (which handles media buttons)
I filed #936 for this issue. "Alexa pause" seems to send media button events, it seems like we'd only need to implement the MediaSession APIs (this bug) to get media playback on pages that don't support media button events themselves: #935 is to implement this for the hardware remote media buttons (which would be using the same code as MediaSession to find videos on the page and stop them).
To summarize:
@Sdaswani, what are we trying to accomplish with this bug? For voice, do we want to support: 1) Just play/pause on youtube 2) Play/pause/seek/prev-next/restart on youtube 3) ^ on all sites
@Sdaswani Also, do we care about audio or just video for now?
Alexa, without MediaSession, is sending media key events: https://github.com/mozilla-mobile/firefox-tv/issues/936#issuecomment-398216178
I managed to get MediaSession callbacks to start working: we had to call requestAudioFocus
. This means we'll have to figure out all the details of what calling this method, and thus being a "media app", entails.
I managed to get MediaSession callbacks to start working: we had to call
requestAudioFocus
And we have to call:
val mediaController = MediaControllerCompat(this, mediaSessionCompat)
MediaControllerCompat.setMediaController(this, mediaController)
Next steps:
Notes on being a good media app:
Some current behavior: Video:
Audio:
@Sdaswani This is going to take longer than expected: I can't get the MediaSession API to work consistently. It's hard to make accurate estimates because I don't fully understand what's required to implement the API.
My current code is here. When I restart the device (i.e. get a clean slate), my code does not work (the onPlay/Pause methods are not called). However, if I open the Media Session Sample app first and then open my app, it works correctly – I'm guessing I'm not correctly managing the audio focus state and it's making my app interact with the system strangely.
I plan to dig into the Google sample code next to see if I can figure out what I'm missing.
I can't get the MediaSession API to work consistently.
The Google sample app, Universal Media Player, does not work in the same ways that my code does not work. Also, Amazon's Media Session Sample App will not work on the first "Alexa pause" after a device restart (saying it again works).
This makes me wonder if I should just go full steam ahead with my code, that sort of works, and figure out this issue later.
Another issue I found: when I rebooted the device, opened the sample app (paused twice to get my app to work), opened my app to test and specified, "Alexa pause", my app didn't appear to receive the media events but the google sample app did (it started to play music), which doesn't really make sense to me (maybe they can register as a system provider that takes media events before the app is even opened?). But it's strange because I was granted audio focus (but "Alexa pause" causes me to lose it).
Another weird experience: if I start the app, then go to a website with HTML video, and say, "Alexa play", nothing happens. However, if I start on the homepage, say, "Alexa play", the media command will be received. If I then go to the page with HTML video and say "Alexa play", the media command will be received.
I wonder if the WebView is managing audio focus somehow (in the former case, if I play the video, I will lose audio focus).
To summarize, to get my code to work, I need to (after a restart):
Media commands should now work.
edit: I fixed this by setting mediaSession.isActive = true
before voice commands are issued.
Actually, with mediaSession.isActive = true
, I no longer need to launch the Media Session Sample App before voice commands start working (though I think the first one might be delayed).
Also, it doesn't appear that I need to requestAudioFocus, which is good because the WebView seems to steal audio focus from the app anyway (which means it's handling it for itself and we don't need to do any extra work there). :) This should simplify the implementation.
Okay, I seem to have figured out why I've been getting inconsistent behavior. I needed to set mediaSession.isActive = true
and this part:
val pb = PlaybackStateCompat.Builder()
.setActions(SUPPORTED_ACTIONS)
.setState(PlaybackStateCompat.STATE_PLAYING, PlaybackStateCompat.PLAYBACK_POSITION_UNKNOWN, 1.0f)
.build()
On an fresh run, the voice commands work correctly. However, if I set the state to PAUSED
instead, Alexa states "What would you like to hear?" instead of executing the play command, which was unexpected to me. This brings up the question: should Alexa be able to start a video that was not playing through voice?
Status update: I've got a close-to-done WIP for play/pause state. The only problem is that the web site is receiving the media button events in addition to our code running, which causes an undo to the playback state change. This WIP hacks around the MediaSession API, for implementation speed, but it appears to function correctly.
Status update: play/pause, with the hack around the MediaSession API (state is always PLAYING), is up for review.
Next steps:
@aminalhazwani "Alexa next/previous" will dispatch a "Media next/previous" keyboard button event. Some websites may handle this, e.g. youtube will advance to the next video, while others may not. When websites handle it, it's a good experience. However, when they don't, the user is left waiting for an action that will never happen. We can't tell when a website will handle the event but should we display a toast like, "Received next command" (with better copy), to notify the user their action has been received?
Status update: I can get the device to acknowledge I want to include FF/rewind/next/previous commands but they're not getting delivered to my MediaSession: I wonder if it's because I'm not updating the playback position regularly.
edit: I restarted the device and now can fast/forward.
Status update: I've opened a PR for a basic implementation that does not conform to the MediaSession APIs. I filed issues for follow-ups to better conform, which we can prioritize.
We tested the 2.2 APK on FireTV Cube. Steps:
crash_log_cube.txt Crash Log
Thanks @HiralSModi I filed some new tickets: https://github.com/mozilla-mobile/firefox-tv/issues/965 https://github.com/mozilla-mobile/firefox-tv/issues/966
We can't tell when a website will handle the event but should we display a toast like, "Received next command" (with better copy), to notify the user their action has been received?
@mcomella sorry for the late reply, I missed the notification. Rather than a visual feedback we could opt for an instant voice feedback, something like "Ok" or "Got it". If possible we can then set a small timeout and the let users know that the website is not responding to the voice commands.
- "Alexa, pause" - video paused
- Again say "Alexa pause" Expect Behavior: Nothing happens
@HiralSModi rather than "Nothing happens" what if Alexa says something like "Video/Media is paused", or "Video is already paused"? So on first interaction we have a visual/audio feedback coming from the video/audio itself. On second interaction we have Alexa jumping into the conversation for better clarity.
User benefit
On the new Fire TV cube device, you can interact with video/media using voice and the MediaSession API. This would be convenient for users.
Requirements