w3c / mediacapture-output

API to manage the rendering of audio on any audio output device
https://w3c.github.io/mediacapture-output/
Other
26 stars 25 forks source link

Controlling 3rd party iframe audio output on a page? #63

Open randallb opened 7 years ago

randallb commented 7 years ago

(forgive me, this is my first time attempting to contribute)

We've built an application to allow for video streaming, and we essentially render the video using a browser, then output the framebuffer to another encoding process and send it along to an RTMP destination. (In our case, it's usually FB Live.)

We'd like to enable our users to easily embed other media in their productions.

Today, we capture the audio by changing a machine's default audio device to one we custom wrote, then reroute the audio to our encoder process. This works great generally, but has the major drawback of disabling audio for the rest of the computer.

As the web audio spec has started shipping in Chrome specifically, we've started experimenting with using the web audio output api to redirect the audio properly. Basically, we use the enumerate devices api to find our driver, and if a confluence of things are correct, we direct our audio to go out to that spot explicitly using the setSinkId of audio and video elements.

The issue is if we'd like to embed other external media, like an iframe from YouTube as a simple example, we'd need YouTube to explicitly support switching audio destinations in their postmessage api. We view this as unlikely given our usecase is more edgecase for their business. We think the top-most context for a page should likely be in charge of where audio ends up, if inner iframes haven't changed their sound settings past 'default'. Basically, the top-most context could be in charge of all audio routing ideally.

I'd propose a setSinkId api on an iframe, just like we have on audio / video elements. If this has been done before, I apologize, I wasn't able to find any data on this pretty much anywhere on the web.

I think there's likely some technical challenges here, but I think for advanced audio / video (what i'm obsessed with) it'll help a lot with what the web is great at: linking and embedding resources.

alvestrand commented 7 years ago

Sorry, lost track of this one. Will attempt to look at the issue soon.

alvestrand commented 7 years ago

Having thought about this some more... the only not-completely-bogus thing I've been able to think of is that we could add an attribute to the Iframe object (https://html.spec.whatwg.org/#the-iframe-element) called "defaultAudioOutput", which would be the place where default audio output is sent.

The advantage of an attribute over a setter is that you can read it, and that it should be easy to say "on creation, it is copied from its parent context".

We could also place it on the objects referenced by the iframe's "contentWindow" or "contentDocument" elements - these are even more generic classes, so would have even more things to sort out (and perhaps other use cases).

I don't want to monkey-patch HTML (more than we already do), but that seems to be required both for a setSinkId() function and a "defaultAudioOutput" attribute on these objects. @foolip @domenic do you have thoughts about how to best extend HTML objects for this type of control (or why we shouldn't do it)?

domenic commented 7 years ago

By attribute, do you mean IDL attribute, or content attribute?

Note that contentWindow and contentDocument are just the window/document of the iframe, so if you placed things there it would be placing them on Window/Document classes universally, not just ones for iframes.

As for monkey-patching HTML, it's mostly reasonable to do this using partial interfaces without it being problematic. The only issue I can see is if you want to copy the value when creating the iframe (instead of, e.g., looking it up lazily up the chain). That is not very extensible in HTML right now. We could add a hook though, e.g. "run any iframe creation steps in applicable specifications", and then you could define "iframe creation steps". Although, I am not sure if you want iframe creation, or src="" attribute setting, or something else.

alvestrand commented 7 years ago

I've not yet figured out what the difference between IDL attributes and content attributes are; I think I want to do something like "navigator.media.enumerateDevices().then(list => .defaultAudioOutput = .id)

Not sure it makes sense to change the default after the iframe is initiated, which argues for making it part of what you hand to the iframe when creating it.

Still in brainstorming mode on this; @randallb may want to comment given the constraints of his use case.

domenic commented 7 years ago

Content attributes are <iframe something="..."> in the HTML. IDL attributes are iframeEl.something = "..." in JavaScript. You can have both by saying that the IDL attribute reflects the content attribute. (Almost all content attributes are reflected as IDL attributes, in fact. But not necessarily vice-versa.)

If the idea is to not change after the iframe is created, then I'm not sure which is better. Again it kind of depends on what you mean by "created". The sandbox="" attribute on iframes has one model, where it re-reads the value on navigation of the iframe (e.g. setting .src = "...", or clicking a link inside the page). I think that is also the model used by feature policy and the allowX attributes.

alvestrand commented 7 years ago

The thing that made me worry about changing the attribute after creation is that when an iframe is instantiated, and starts producing sound, changing the attribute (if allowed) would switch the sound while it was playing, with the change being invisible to code running inside the iframe. This may be tricky to implement, so I'd like to avoid doing it. (if @guidou claims it's easy, and no other browser person claims the opposite, we can just allow it.) Re-reading the attribute on navigation sounds like a reasonable model if we agree that we shouldn't allow the container page to change the default output device while the page is playing sound.

alvestrand commented 7 years ago

@guidou I think this is clarified enough that you can propose a spec update.

randallb commented 7 years ago

I think changing it on create is fine, fwiw. In a worst case where we'd need to change the audio output of an iframe, we'd just destroy and recreate, updating the internal state of the YouTube player or what have you.

stefhak commented 7 years ago

@guidou you're assigned to this Issue, will you have time to look into it (soon)?

guidou commented 7 years ago

@stefhak I don't think I will have time to look into this issue soon.

stefhak commented 7 years ago

Getting desperate, is this something you could take a look at @yellowdoge?

yellowdoge commented 7 years ago

@stefhak sadly (or happily, I'd say) I'm swamped with shipping Image Capture and Shape Detection;

@hoch, would you have some time to look at this by any chance?

randallb commented 6 years ago

Is there anything I can do to help progress this spec?

dontcallmedom commented 6 years ago

@randallb fwiw, given that we can't seem to find enough resources to spec that additional behavior just now, we will likely proceed with the next steps in the standardization process with the spec as is. That said, this is not to say that this feature won't be considered for inclusion later - just a recognition that the spec as it exists today matches what implementations are shipping or considering shipping.

TheBrenny commented 6 years ago

Bump - Would love to see this implemented! 😁

nefelin commented 4 years ago

Bumping would also find this very useful!

foolip commented 4 years ago

@clelland has this been proposed as a feature policy?

clelland commented 4 years ago

There was a 'speaker' policy proposed, to complement 'microphone' and 'camera' from the media input side. The actual behaviour was never really decided on, and the presence of the policy was confusing (people thought that it should control all audio output, which it definitely didn't do) so it was removed recently from Chrome. It's still present in Feature Policy's feature list.

foolip commented 4 years ago

I see. I think something that applies to all audio output is actually the policy that would be useful. But that hasn't been pursued, then?

bigicoin commented 4 years ago

Is this still being worked on? Would be cool to use for a hobby project I have.

brianfields commented 3 years ago

This would be particularly useful for collaborative A/V applications where you'd like the user to be able to select their preferred audio output device. My use case is a classroom instruction application (using WebRTC for streaming video) where the teacher would like to play a YouTube video that is seen and heard by everyone in class. We have application level control over the WebRTC audio output but not the iframed YouTube, so--to avoid confusing people--we don't allow the user to select their preferred output device (instead, it uses the system default). This has caused plenty of frustration.

MatanYemini commented 3 years ago

The same problem that I am getting into. Any ideas? That can be cool project :)

toschlog commented 3 years ago

It would be great to be able to do setSinkId on an iframe. I can control where on my page iframe visuals appear; I should be able to control where the audio goes.

Johnny-John-John commented 3 years ago

Any updates on this? I would love to see this since I am working on something which needs audio from an Iframe element and do fft magic with it. Or does this feature cause privacy issues? What if some company, say Youtube, want their Iframes' audio to not be analyzed?

freddy-daniel commented 2 years ago

It would be nice if there was a final solution to this problem. or someone will ask same in 2023 πŸ˜‚

bigicoin commented 2 years ago

I think everyone who had hoped to do this in a project probably gave up on their projects already because of this. πŸ˜›

randallb commented 2 years ago

I'm closing as not planned for now.

dontcallmedom commented 2 years ago

re-opening to make sure this gets formally addressed - sorry this is taking so long though

MatanYemini commented 2 years ago

@dontcallmedom Hi - I don't mind contributing. Do you have suggestions/ideas (if you want me to jump on it) ?

ErikDombi commented 1 year ago

2023 now, would still love to see this one!

MatanYemini commented 1 year ago

@dontcallmedom Dominique - what do you think? maybe we can do this one?

dontcallmedom commented 1 year ago

@hoch @padenot @mdjp with the Web Audio Working Group having adopted setSinkId for the Web Audio API, I wonder if that group would have more momentum in making progress in pushing this issue forward? for better or for worse, the WebRTC WG has never managed to put enough priority, despite continuous demand for the feature

hoch commented 1 year ago

@dontcallmedom

Where would be the right spec/venue for the discussion though? One thing for sure is that Audio WG can't expand Web Audio API to make this happen.

I see the demand here in this thread, but still a bit unsure about its priority. (compared to projects that the Audio WG is currently working on) As you already are aware, the Audio WG is a relatively small group than others.

kenzkiran commented 9 months ago

This is a good feature to have. We have an app that allows for embedding of Youtube, Vimeo and other 3rd party providers. Our users have the flexibility in our app to route the audio from the top level frame to any choice of the speaker,BT device connected to the device. But we are unable to provide a consistent experience when the app has to embed video from 3rd party providers. @guidou @juberti how do we get traction on this. There are many ideas floating around including specifying attribute on the child