Capturing audio-only - Githubissues

jespertheend commented 5 years ago

Hi, I was wondering why in https://github.com/w3c/mediacapture-screen-share/commit/c5c484213b15ead2780b1bcd822feff7e7872f1f

MUST reject an audio-only request if no audio track is available. got changed to MUST reject audio-only requests.

Right now it seems like all audio-only requests need to be rejected, even if the system does support audio tracks. Is this intended?

henbos commented 5 years ago

Yes this is intentional. Audio capture is perceived as an optional addition to screen capture, something that can exist side-by-side video but is not enforced to have to exist. A getDisplayMedia({audio:true, video:true}) is allowed to return only a video track if the user wants to share video but not audio, or if the implementation/platform does not support audio capture. A request cannot limit the user's selection (e.g. which window to share since the application is not allowed to know which windows exists), and a request cannot force a user to pick audio. This is different from getUserMedia() where audio is non-optional if requested.

In the screen capture case, audio is a lot more technical and its availability to some extent platform-dependent. The spec is purposefully vague about what audio has to be provided. Even to the point of making it optional.

henbos commented 5 years ago

For example, I believe Mac does not allow capturing system audio, so the only audio that might be implementable on that platform is tab/browser audio.

jespertheend commented 5 years ago

Alright, thanks for the clarification. That makes sense.

The reason why I brought this up is because in my use case I only need the system audio. Right now I’m using a chrome extension to capture both the screen and system audio. But capturing the screen seems to lower my framerate significantly. So that’s why I was hoping there would be a way to request audio only.

Though I guess this is very much an edge case and this would probably require drastic changes to the spec or maybe even a new API altogether.

jespertheend commented 5 years ago

Would calling MediaStreamTrack.stop() solve this or would that just stop the screen capture altogether? I couldn't find anything about this in the current spec.

henbos commented 5 years ago

I believe you can stop the video track without stopping the audio track (like getUserMedia()), which should fix your performance issues. Undeniably this is a workaround for using an API not the way it was intended. I.e. the user gets prompted to share their SCREEN, and depending on their choice/platform you MAY get audio. It's not ideal.

youennf commented 5 years ago

Agreed this is not ideal. Should the privacy indicator give information about the audio being captured? If there is a possibility to capture audio but not screen, should it be reflected there as well?

jespertheend commented 5 years ago

Yeah I agree. It’s not ideal but it’s the best solution for now. The alternative is asking users to install software to route system audio to a virtual microphone such as soundflowerbed. So using this APi is much more user friendly. I’ll try to stop the video track and see if that works.

bradisbell commented 2 years ago

@henbos @jan-ivar Is it still the consensus that audio-only streams are out of scope for this API?

Audio-only use cases are artificially damaged at the moment. If a user wants to capture audio from the system or tab, we first have to prompt them to share their screen, hope they can figure out checking the "share audio" box, then stop the video stream, and hope they don't care about the UI saying we're sharing their screen when we aren't. From there, the audio only stream works fine on supported platforms.

The technical concerns raised a few years ago don't seem as relevant now. It seems like the only thing holding back audio use cases is this specification, and the established position that audio is out of scope.

What are the specific concerns about getDisplayMedia({video: false, audio: true}) today? Is it a good time to revisit?

henbos commented 2 years ago

I'm not up to speed on this anymore, but IIUC system audio capture does not exist on all popular OSes. I suspect you would be less interested in audio tab capture?

jespertheend commented 2 years ago

The application I built provides several ways for users to share audio. Either via microphone, pasting a url, or by sharing their system audio. So having audio tab capture available would still be beneficial, even if it is only available on some platforms.

eladalon1983 commented 2 years ago

I think it's counterproductive to force a user to share their (screen AND system audio) when they only want to share (system audio). Similarly for sharing (tab pixels AND tab audio) when they only wish to share (tab audio). Let's reopen this. If there's a simple explanation for such a constraint, and that explanation still holds, then we can always re-close.

henbos commented 2 years ago

Privacy and security: I'm not saying audio-only capture is necessarily better or worse from a privacy and security perspective, but it may be different enough to be worth revisting if the privacy and security versus use cases trade-off is still worth it.

Unlike screen sharing, audio-only capture might not be as clear to the user that capturing is happening or what is being captured.
Unlike screen sharing, audio-only capture is much more limited in use cases. I'm personally having trouble coming up with convincing ones.

But even if that is resolved: doing an API that only works on some OSes, or that works different on different OSes, seems controversial in and of itself.

So I we should have some really strong use cases for this... what are they?

Btw I agree that doing audio+video capture when you only want audio is counterproductive - it's a workaround to not having solved the audio-only capture use case. But the question is: do we want to solve this use case?

I would start with "what use cases do we want to solve?"

jespertheend commented 2 years ago

My use case is that on https://fract.space/ there are several methods for audio input, so that it can be used to visualise the fractals based on the audio that has been provided. It's still a WIP, so at the moment dragging a file to the page is the most reliable way to do this. It is also possible to share screen/tab audio (this version still uses an extension), but if you do so the performance gets much worse, presumably because recording the screen comes with a lot of overhead. I've seen better results by piping system audio to a virtual microphone and then using that as input. So I'm confident the same will be true if this were possible with getDisplayMedia().

eladalon1983 commented 2 years ago

The question of use-cases is interesting and I'll get back to that. Quick recap first of the benefits of supporting audio-only, assuming use-cases exist:

Benefit to the user:
- Enhance privacy through not having to share more than strictly intended.
Benefit to the application:
- Improved relationship with the user - no need to alarm the user by asking for irrelevant permissions. (Users should not be expected to understand that the app cannot capture audio without video.)
- Improved efficiency (avoid unnecessary CPU/GPU load by unnecessarily maintaining screen- or tab-capturing).
Benefit to the browser:
- Improved efficiency (see above).
Benefit to the Web platform:
- Better-educated users - make it more common to only share what's strictly necessary.

Clearly an all-round win - provided we have the use-cases. Back to that, then. :-)

Audio-record meetings into a file. (One app captures another; no integration needed.)
Record lectures, music, etc. (One app captures another; no integration needed.)
Call centers. Assume you're building software that exposes some functionality to the operator - some CRM-like software. Your software imports some functionality by embeding some VC application in an iframe, allowing you to talk to the customer/patient/colleague while interacting with the CRM-like software concerning his call. By recording tab audio, your own software now quickly gets to record the call to a file without having to integrate more closely with the VC software you're embedding. No need to ask them to make changes to accommodate you as a customer, wait for that to happen - you just implement recording of (i) the tab and (ii) the mic and you're all set.

bradisbell commented 2 years ago

My use cases involve live stream production, where users capture audio from multiple sources. This often includes other software such as Skype, Zoom, DJ-style media players, other tabs sharing audio, etc.

Audio-only streaming is very much relevant. Video is not always required nor desired. Native applications have no problem in capturing audio in this way. We only have to look to the success of software such as OBS, Wirecast, Rocket Broadcaster, Voicemeeter, etc., to understand that we need this capability on the web. All of these support audio-only modes and capturing from other applications.

The users should be empowered. This specification and its implementation are the limiting factors right now, as I see it.

I propose:

getDisplayMedia({audio: true, video: true}) will show the current sharing dialog, with "share audio" checkbox pre-checked. (Users may un-check it.)
getDisplayMedia({audio: true, video: false}) will show the current sharing dialog, with the correct verbiage explaining that only audio is being shared.
getDisplayMedia({audio: {exact: true}, video: ...}) will show the current sharing dialog with "share audio" checkbox pre-checked and locked. User may cancel to reject. Only surfaces supporting audio sharing will be selectable. If audio sharing is not available, an overconstrained error should throw.
Shared applications may have a similar overlay as they do now, but with verbiage explaining that only audio is being shared.

In other words... I propose that this API match what is expected and consistent with the getUserMedia() API. This solves for existing use cases, providing a generic interface for future use cases, simplifying the API for application developers, hopefully resulting in minimal user agent changes, while continuing to offer a user experience that is very similar to what exists today.

wooster0 commented 2 years ago

Please allow capturing only system audio.

I'm in a situation where I now have to play with both getUserMedia and getDisplayMedia to hopefully allow the user to just share their system audio. For getDisplayMedia I have to reassure the user that I do not record their screen because I don't make use of the video data in any way. I wish I wouldn't have to tell the user that in the first place; it's a bit frustrating. Forcing video is wasteful of resources and doesn't give the user a nice impression when the site asks for video even though only audio should be required. To explain why it is required I would have to inform the user about technical limitations and it overall just worsens my site's accessibility.

There are a ton of use cases for this. For me I want to take the user's system audio (whatever it may be) and process it in whatever way the user desires. For example what if the user wants to use the music from an MV on YouTube in another tab? This would be perfect for that. My app does not concern itself with video.

@bradisbell's proposal sounds good to me.

schreibmachine commented 1 year ago

I would like to second what @r00ster91 said. I am currently trying to build a web-based audio visualizer and wanted users to be able to capture their microphone and/or system sounds. After a frustrating day trying to find out how to access system audio I now see that it's simply not (easily) possible. And now also understand why seemingly all web-based audio visualizers use uploaded media... If privacy is a concern, what is the difference of system audio to the microphone? Isn't the microphone an even more sensitive audio stream?

eladalon1983 commented 1 year ago

It is unfortunate that this issue, which has garnered so much attention from Web developers over the years, has not received commensurate attention from browser vendors other than Chrome (here). I think a good next step might be to present this issue in the Screen Capture group's meeting, 2023-06-26, and proceed to create a mediacapture-screen-share-extensions spec hosted by that group. If anyone is interested in presenting it then, I will gladly add it to the agenda. Let me know.

chrisguttandin commented 1 year ago

If no one wants to present it this time, I would be happy to present it in one of the next meetings, if it's okay for a CG member to present something. But I can't attend the upcoming meeting in two weeks.

eladalon1983 commented 1 year ago

The process to join community groups is a lot more lightweight than that of joining working groups, and I believe it does not involve payment to the W3C. At any rate, anyone is welcome to present.

cybex-dev commented 12 months ago

Coming from this SO post with high hopes, though it seems audio-only capture is a far way off from a production ready feature?

Pavinati commented 8 months ago

I'm currently working on tools for Djs and live streaming. This would be very beneficial for my users instead of having to download external software and register virtual microphones

salvymc commented 8 months ago

Call centers. Assume you're building software that exposes some functionality to the operator - some CRM-like software. Your software imports some functionality by embeding some VC application in an iframe, allowing you to talk to the customer/patient/colleague while interacting with the CRM-like software concerning his call. By recording tab audio, your own software now quickly gets to record the call to a file without having to integrate more closely with the VC software you're embedding. No need to ask them to make changes to accommodate you as a customer, wait for that to happen - you just implement recording of (i) the tab and (ii) the mic and you're all set.

this is exactly the use case I'm in, I should only capture the audio of a chrome tab, in order to record the call and quickly export it in .weba.

const stream = await navigator.mediaDevices.getDisplayMedia({preferCurrentTab: true, audio: true, video:false});

Unfortunately, currently it is not possible to deactivate video capture

console error: Failed to execute 'getDisplayMedia' on 'MediaDevices': video must be requested

and I find it impossible to acquire the call from the webphone and obtain my use case.

I think it's really absurd, not being able to capture just system audio in 2024. now I could get a complete track of audio in the call and audio from the operator's microphone instantly.

ryabenko-pro commented 6 months ago

I'm just curious how come people who works on tools for community ignores the rational community requests.

eladalon1983 commented 6 months ago

I'm just curious how come people who works on tools for community ignores the rational community requests.

I think "ignore" is too strong. I think it's a matter of disagreement on prioritization, and hopefully the critical mass that is building on this issue could motivate re-examining that. Wdyt, @jan-ivar and @youennf? What is the rationale for maintaining the requirement that video must be captured?

salvymc commented 6 months ago

Many applications could benefit from acquiring the audio of a tab, especially the world of telephony and voip

w3c / mediacapture-screen-share-extensions

Capturing audio-only #12