Closed jianjunz closed 1 week ago
I wonder whether this should actually be at MediaStreamTrack level. Maybe we do not need a constraint either.
Given this event would fire when the track is muted, the goal would be to unmute the track, which would be done in via MediaSession API. Moving this API to MediaSession makes some sense.
Maybe all we need is a new MediaSession voiceActivity
action.
Registering this handler would kick in the necessary UA logic to trigger this action.
@jan-ivar, @guidou, thoughts?
I'm wondering when this action should be triggered if voiceActivity is moved from MediaStreamTrack to MediaSession
1 and 2 may have privacy issue because users may not want applications to know their behavior before granting "microphone" permission.
With current AudioWorklet approach, applications are able to know which track has voice activity. I personally believe applications only want to detect voice activity for microphone with MediaStreamTrack created and muted, but I'm not sure if any application applies vad for other audio tracks.
The privacy story should be the same whatever the API shape. I agree with having a voiceActivity MediaSession action only for contexts that have live (and muted) microphone MediaStreakTracks.
If we want to support multimicrophone cases, a deviceId could be exposed within MediaSessionActionDetails.
I personally believe applications only want to detect voice activity for microphone with MediaStreamTrack created and muted
Agreed for the scope of this specific API.
I agree with having a voiceActivity MediaSession action only for contexts that have live (and muted) microphone MediaStreakTracks.
Moving this to media session makes sense to me as well.
@steimelchrome FYI
@jianjunz , would you be ok drafting a PR on MediaSession WG ? I can take over if you prefer.
Since this is intended to help the user to unmute via the unmute button in the app, which would be done via MediaSession, it makes sense that this notification comes via MediaSession. Given that this is largely a MediaSession thing, I don't think we should have a requirement that a MediaStreamTrack is muted (although it will most likely be).
I do not think there is any sense in moving this to MediaSession. There are far more use cases for voice activity detection beyond letting the user know that they may be muted. A couple use cases I would implement immediately if this API were available:
These use cases and others like them rely on the voice activity detection firing on the track.
Besides, even if it were moved to MediaSession, choosing the right capture track to trigger on is not possible at the user agent level. It's not uncommon to have several capture tracks. The relevant captured track might even be "remote". (Think of cases where a local second device/screen/camera/mic is set up. Connected via WebRTC, but right there in the room.) Only the application truly knows what is what.
There are far more use cases for voice activity detection beyond letting the user know that they may be muted
This was discussed during the WebRTC WG meeting and we think there are two usecases which deserve two different solutions.
The first use case is allowing to unmute when user is talking while muted. This PR is about this specific issue and moving it to MediaSession seems good.
The second use case, which you seem more interested, is exposing whether a live unmuted track contains voice. This needs more thoughts as firing an event will always be more or less out of sync with the audio data. And it can already be implemented with audio worklet (though less efficiently) where the extracted data will be in sync with audio. This use case seems more tied to MediaStreamTrack than MediaSession.
@jianjunz , would you be ok drafting a PR on MediaSession WG ? I can take over if you prefer.
Sure, I'll create a PR on MediaSession WG. Thanks.
This change adds support for the voice activity detection (VAD) feature for audio MediaStreamTracks. It is only enabled when
voiceActivityDetection
constraint is set to true.With
voiceactivitydetected
event, web applications are able to show notifications when the user is speaking but audio track is muted.Fixes #145.
Preview | Diff