Closed yoanlcq closed 7 years ago
Yeah, I think capture is definitely worth supporting if we can swing it. I've been punting on it for now since there's so much else to do, so if you want to dig into this that would be fantastic.
As for the approach, I think either 1 or 2 is best. As much as possible I'd like to avoid people having to make special considerations for emscripten unless they opt into it. In that spirit, exposing a few emscripten specific extensions is something I've been considering; we could have one like ALC_EMSCRIPTEN_capture_allowed
that polls whether or not capture permissions have actually been granted. That's for later though, for now just getting it working at all is more important.
To help avoid possible merge conflicts, since we'll be working in the same file, I'll carve out a section in the internals block for capture stuff, and you can put anything you need to in there. I won't touch anything inside there, or the alcCapture*
entrypoints, so we should be pretty safe.
Ok, the capture section is added, so feel free to dig in whenever. If you need to make changes outside that block, just let me know so we can coordinate.
Alright, sounds good! Also I really like the idea of having an ALC extension so that people who need that kind of control can opt-in - It would call for some documentation though, which is yet another task (or we might just ask kripken et al. what to do about it when proposing the PR).
Some things will require me to operate outside the capture sections :
$AL
object will need a new AudioContext
member (possibly named alcCaptureAudioCtx
). Or even better, a captureCtx
member which would contain an AudioContext
AND other internal state;alcGetString(NULL, ALC_CAPTURE_DEFAULT_DEVICE_SPECIFIER)
alcGetString(NULL, ALC_CAPTURE_DEVICE_SPECIFIER)
alcGetIntegerv(valid_device, ALC_CAPTURE_SAMPLES, 1, &frameCount)
And some concerns :
AudioContext
and can't be set, that means alcCaptureOpenDevice()
could fail, given what are otherwise commonly-supported values like 44100 or 48000.AudioContext
beforehand and returns its sample rate so that the app may correctly handle it.
Or I'm overthinking it and correct behaviour could be achieved with a "brute force" approach.Also, in case we end up providing extensions, we should probably try to fix alcGetProcAddress
, looking at how emscripten_GetProcAddress()
was implemented for OpenGL.
And lastly, it looks possible (not sure) to implement something similar to (but not the same as) ALC_SOFT_loopback
but, again, gold plating.
Adding members to $AL
is fine, since that can also be done inside the capture section. When we get closer to making the PR we can hoist them back up with the other members.
As for alcGetString
/Integerv
, feel free to add to those, since I probably won't be touching them anytime soon, at least until we get to extensions, which will be a while yet.
I'd give pretty good odds that either 44.1khz or 48kz are supported by most web capture implementations, so brute forcing it probably isn't THAT bad, but it's definitely something to consider for an extension.
As for the device name, right now the output device is just called "Device", so capture could just be called "Capture". I doubt it will matter much in the long run, so feel free to go with what you think is best. I'll probably change the output device name to at least indicate it's a web audio device.
I've taken a look at how the GL implementation does extensions. It's a little wonky, but not too bad. I'll probably be doing that pretty soon, just to get the stubs in place.
Finally, I also looked into implementing ALC_SOFT_loopback
, but concluded it's either impossible or too hard right now, since it requires synchronous rendering, while webaudio only provides async. If we want we can swing back and take another look once everything else is done, but it's probably not that big of a deal, since the primary use of loopback is to output audio with a different API, and on emscripten there's no point in doing that.
Okay, thanks!
About loopback though, implementing it as ALC_SOFT_loopback
might be hard, but what about (yet) another custom extension that's suited to Web Audio's constraints ? The way I see it (with my limited knowledge), it's a matter of redirecting all output to a single Javascript node (copying data to buffers) which would also happen to output to speakers.
I could see a use case for this: game (or game engines) that want to allow "recording" gameplay footage (perhaps saving them to video files for later replay) even on Emscripten. Yes, there are screen recorders and all the stuff, but providing this in-game would probably make average users happier (this would even work on mobile devices, etc), that kind of thing.
But it does look a bit too involved for what users would make out of it. Perhaps they wouldn't even use OpenAL for such a thing in the first place.
So anyway, I'll try to get the basics working, and hopefully succeed in adding tests.
Small update, the whole sample rate story is a non-issue, since, quoting the spec, "The implementation is expected to convert and resample to [the requested] format on behalf of the application". One error less but some more work.
Also, it looks like "loopback" wasn't the correct term for what I had in mind (which was, "capture of the application's own audio output", and not "system-wide audio output").
Just as a heads up, I had to make some minor changes to the alcGet* methods to support context attribs and lay the groundwork for extensions.
Small update - Wanted to say I'm still "on it", sorry it's taking quite a while. Turns out "OpenAL's way" of doing things doesn't map very well to the "Web Audio way".
The whole thing has got me running around in circles for quite a while (and I haven't given my 100% on it either), but now there's only a couple solutions left that I need to try and (hopefully) push.
To give a quick idea, it all has to do with AudioContexts being a scarce resource, the "make-everything-asynchronous" mindset of WebAudio and the AL implementation having to perform resampling if necessary w.r.t the sample rate requested by the user.
No problem, there's still plenty of work to do writing new tests anyway. And if it ends up not working out, that's fine. There's a few extensions I had to drop that seemed like they'd work well but ended up just not being practical.
Finally got the thing working (link to sources if you're curious), along with some tests in C ! Works fine with various frequencies and formats.
What's left to do is integrate the Javascript part to library_openal.js
, and move the tests into the existing infrastructure.
Then I'll probably want to add a few extensions (drafts here), and work around Chrome's bad handling of numerous OfflineAudioContext
s.
Roughly, my implementation creates and reuses a "shared AudioContext
" for streaming microphone input into a ScriptProcessorNode
, which onaudioprocess
callback creates an OfflineAudioContext
for the sole purpose of resampling the input buffer into our own ring buffers. These ring buffers are read from, interleaved, and type-converted from float32 at the time alcCaptureSamples()
is called.
That means, if we pick 1024 as the ScriptProcessorNode
's "buffer size", then every 1024 sample frames a fresh OfflineAudioContext
is created (and expected to be destroyed once it's not used anymore).
Firefox seems to handle this correctly, but Chrome displays an "Aw, snap!" error at some point, so my plan is to add logic to reuse OfflineAudioContext
s as much as possible, making (sadly) everything a bit more complex than it already is.
I wish the implementation would have been more "direct", but as far as my experiments go this is the simplest (if not the only) way to make things work.
Also, when the time is right, it would be nice to discuss the extensions I'm suggesting (and how to pick actual values for new enums?).
Cool, good to hear things are working!
I haven't looked at the code yet, but my first concern would be that ScriptProcessorNode
is deprecated in the latest drafts of the spec. I believe AudioWorklet
is the replacement, but I've only looked at it briefly so I'm not sure what the full implications are.
As for the OfflineAudioContext
s, I guess I don't understand why there needs to be more than one per capture device. Couldn't data jus be continuously streamed through the same one rather than creating new ones? Instantiating contexts has a pretty heavy cost so we should try to avoid that if possible.
As for choosing extension values, I've been wondering that as well. I've been meaning to discuss it with the author of OpenAL-Soft, but haven't got around to it yet. I've been busy with other things lately, but hopefully I'll get back to working on this in the next few days.
(Huge wall of text ahead, my apologies :) )
I share these concerns - I'll try to explain the situation as best as I can.
ScriptProcessorNode
is indeed "deprecated" since 2014 and said to be "soon replaced by Audio Worker", but the thing is, it's 2017 and no browser vendor seems willing to make the first step.
Because Audio Workers are not yet a thing, there's nothing I can do (not even mentioning that Web Workers bring a whole other set of problems anyway).
In all cases, these guys seem to be in a constant deprecation rampage anyway - nearly every action taken by my scripts have one "modern" way and one "deprecated" way to do the exact same thing in ways that slightly differ (typically, "callbacks vs. promises").
I always support both whenever I can, but this has got to stop. Even ScriptProcessorNode
was JavaScriptNode
before they decided to rename it for whatever reason.
As for OfflineAudioContext
s: My very first try was to do what you suggested, however, it just so happens that we can't stream media input through one of these. I see no reason for this behavior, that's just how it is.
Chrome blindly allows this but outputs garbage, and Firefox completely removed the createMediaStreamSource()
method from OfflineAudioContext
s altogether (I believe we can still see comments of surprised people on the relevant Bugzilla).
So at this point the only option is to use (waste?) a regular AudioContext
for streaming input data, but if the requested sample rate for the capture device differs from the AudioContext
's (which is read-only and unpredictable (uugh guys)), we have to perform resampling, and the sanest way to do that is to leverage OfflineAudioContext
s (because internally, browsers use way better resampling algorithms than I think I could ever implement).
So this is what the ScriptProcessorNode
is for - it receives chunks of data from the AudioContext
(referred to as "processing" later on) and somehow feeds them to OfflineAudioContext
s for resampling purposes (referred to as "rendering" later on).
Then we have more problems : processing and rendering both always happen asynchronously, and as a result we can't expect them to execute in any particular order (though we can roughly guess).
That's not all: We also can't just reuse a single OfflineAudioContext
for our purposes, because restarting its rendering process from within its oncomplete
callback is disallowed. Again, why would anybody want that, I have no idea.
With all of this mind, I settled with creating a dedicated OfflineAudioContext
for every newly-received chunk of data, and hoping that the browser is smart enough to notice that there are no more references to them once they're done rendering to our ring buffers. Firefox is like "Yeah fine", while Chrome seemingly crawls under a mountain of Zombie OfflineAudioContexts
.
Yes, I nearly forgot to mention that there is no way to explictly release OfflineAudioContext
s, whereas AudioContext
s do have a close()
method.
Which is why, at the moment, I'm considering managing a pool of OfflineAudioContext
s instead, to reuse them in a clever way, but hopefully one can now understand why I'm not very confident about this.
Also, while AudioContexts
are a very scarce resource (max. 6 in Chrome, but some people say we shouldn't assume more than 1), OfflineAudioContexts
are not (one can create thousands of them with no problem).
Since we do want to preserve resources as much as possible, I'd like our final OpenAL implementation to use only one AudioContext
for all ALContext
s and capture devices at once, because it's easily doable (if not done already).
So all in all I can't say I'm fine with the current solution, but hopefully you saw how the Web Audio API didn't miss a single opportunity to get in my way. When I'm done with all of this, I'm so, so going back to Rust, where at least API designers care about my sanity. ;)
Edit: Also, about constants for extensions, I genuinely think that nobody would care if we picked values in a half-random way. OpenAL is just not that popular and used, and neither is Emscripten (let's be honest).
And while the spec claims there is some sort of process for registering extensions, I seriously doubt that it still applies or that anybody actually cares. It's just not the same level of seriousness as OpenGL.
Even in OpenGL, some extensions such as GLX_MESA_swap_control
are not registered, but this doesn't magically inhibit their existence nor actual use of them.
That said, if we can get things done the "official way" without losing too much time, then yeah, let's go for it.
Ok, that makes more sense now, thanks for the detailed explanation. If an OfflineAudioContext
can't restart itself in its own callback, perhaps you could make 2 of them, and have them restart eachother in alternating fashion.
Right now each ALcontext
has its own AudioContext
, but that can be changed relatively easily. I was going to preserve the ability for different contexts to have different sampling rates, but the spec is very vague about context attributes and whether they are specific to a context or shared by all device contexts. The latter is what would allow us to share AudioContext
s, and it's what OpenAL-Soft does anyway, so I'll add that to my TODO list.
perhaps you could make 2 of them, and have them restart each other in alternating fashion
I think I'm going to try something like this first, thanks for the idea! Again, I'm still not very confident about this because I have a hard time reasoning about control flow and possible races caused by the omnipresent asynchronicity.
I'll see where it leads me, but I'll drop it if it ends up being nightmarish (the whole thing was already a bit too close to this until this evening anyway).
Also, when I think about it, it's probably not that big a deal if we don't end up sharing AudioContext
s. Which OpenAL applications ever use more that one context ? Pretty sure that the few kinds of applications which need this are at a level where they wouldn't even consider using OpenAL.
Wasting resources is indeed bad, but going to great lengths just to work around problems we are not responsible for is not any better IMO.
I trust we'll do the right thing, but we also shouldn't throw too much of our time on problems that aren't worth it. At the end of the day all I wanted was to compile a trivial Rust audio app on Emscripten. x)
Okay, some simple tests led me to conclude that OfflineAudioContext
s are never reusable. It's easy to see by trying this in the browser's console :
var o = new OfflineAudioContext(1, 1024, 44100);
o.startRendering().then(function(e) { console.log(o.state); })
// "closed"
o.startRendering()
// Promise { <state>: "rejected", <reason>: InvalidStateError }
OfflineAudioContext
s seem to be meant to be thrown away once they're done rendering (there's no way to move them out of that "closed" state), so that settles it for me.
Also according to this (from 2012, but for some reason I believe it's still relevant), Chrome doesn't even seem to have garbage collection anyway (which is astonishing).
So right now I fail to see any reasonable way to "fix" the situation.
The only workaround that comes to my mind would be to implement a resampling algorithm on our side, removing the needs for OfflineAudioContext
s altogether. The thing is, this doesn't seem easy to do (but maybe it actually is, idk), especially since I'm not exactly a maths guy.
So what I'm gonna do next is see if I can pull off something like this in a way that doesn't increase complexity by a crazy amount.
Honestly we could probably get by just fine with a linear resampler. It's pretty simple and the results should be acceptable for most use cases, particularly since captured audio is usually pretty low quality anyway. Also, keeping in mind that the chosen capture rate will almost always be one of a very few values, most likely 44.1K or 48K, so in many cases resampling probably won't be required anyway. If someone later wants to improve it, then that's fine, but I don't think we need to go all out for this.
You're right, this is reasonable. Guess I'll do it this way.
Alright, the implementation and tests have made it to this repo, and work as intended (hopefully. I did test it on Firefox and Chrome). There's room for improvement, but not so much (for instance, I should recycle unused indices in the internal array of capture devices and associated data).
I plan to tackle these kind of small "improvements" next, but right now we've got something working! As you said, linear resampling seems to do a fine job.
Awesome, nice work!
For the capture device indices, I suggest hooking into the ID mechanism I've been using for other resources. You can call AL.newId()
to obtain a new unique ID number to use for the capture device, then when you delete if, just do AL.freeIds.push(id)
to make it available for reuse.
Well I think I'm done with the capture implementation. I was planning to implement my extensions proposal next, then properly formatting in into an RFC and sending to the OpenAL mailing lists, but I realized they didn't actually bring much value.
alcCaptureRequestDeviceEMSCRIPTEN()
is not needed because :
chunksize
won't have any impact on quality, since resampling occurs at the last minute when samples are captured;on_success
and on_error
callbacks bring unnecessary complexity :
ALC_INVALID_DEVICE
, then we know it's because the user has denied access;Also, the two integer queries ALC_CAPTURE_BEST_FREQUENCY_EMSCRIPTEN
ALC_CAPTURE_BEST_FORMAT_EMSCRIPTEN
are nearly useless because the former is, 90% of the time, 44100 or 48000, and the latter is actually always AL_FORMAT_MONO_FLOAT32
.
So I'm going to close this - feel free to reopen if you think parts of these extensions are worth pursuing, or if there is any issue with the capture implementation.
Otherwise, I'll be looking forward to the pull request. :)
I dunno, I still think such an extension has merit. I don't find the fact that polling the status could result in an infinite loop to be sufficient justification to abandon it. This is an emscripten specific extension, so anyone using it would need to be aware of emscripten specific semantics. Also, unless I'm mistaken, it's possible to block forever anyway if the user enters a tight loop waiting for capture samples that will never arrive until they yield back to the event loop.
Agreed, but even, say, the hypothetical polling extension's problem is already solved by the very behavior of our implementation (which doesn't do anything nonstandard expect maybe for the "no sample is captured until user grants access" part).
The current impl provides enough information such that applications could infer the "status" value they're after ("Pending", "Granted", "Denied") by themselves, in a way that's reasonably not hackish (described in my previous comment).
I don't expect apps to particularly care that no sample is being captured even a short while after the device has been "opened" (they would just keep polling ALC_CAPTURE_SAMPLES
naïvely), and also I expect them to be prepared for unexpected ALC_INVALID_DEVICE
failures at any time (semantically equivalent to unplugging the physical device).
At worst, app authors would be surprised at first, then quickly understand how it works and how they can handle the situation.
Which reminds me that our capture implementation's behavior should be documented somewhere along with useful related information. Under the site
folder probably - I'll look into it.
I call the "alcCaptureOpenDevice" of OpenAL API,it reports a error,that, (ALCopenslCapture_open: bufferQueue->Enqueue: Buffer insufficient). I looked at the OpenAL source code /ALc/backends/opensl.c . The function ALCopenslCapture_open(...),that run to below of ll_ringbuffer_get_write_vector(self->mRing, data) .It was reported a error .emmm,this Array runned seventh what reported it.I didn't know how i can solve thsi error.please help me. thanks.
@smallStonela I may be wrong but this doesn't look like the problem is on our side; OpenSL is the Android backend, and I couldn't find any of the functions you mentioned in this repo.
Could you give more context, like on which kind of device (i.e native mobile app ? browser app on mobile ? etc) you've encountered these errors ?
Either way it's likely that this place isn't the one you're looking for. :)
Thought that, since we are to fix a bunch of stuff in the AL implementation, why not also implement the
alcCapture
stuff ? Here's what I see:If we don't
If we do
ctx.createMediaStreamSource()
givennavigator.getUserMedia()
with onlyaudio
enabled.The latter asynchronously requests the user's permission to open microphone input (which may be then granted or denied), which is not expected by typical OpenAL apps. How do we want to handle this ?
I see these options :
alcCaptureOpen()
opens a dummy "proxy" device which successfully records nothing until the user allows microphone input, AND still "succeeds" if the user denies mic input;alcCaptureOpen()
fails until mic input is allowed, and we say "If you're on Emscripten, you have to retry several times";At this point here are my questions: