Implementing the alcCapture* API ?

yoanlcq commented 7 years ago

Thought that, since we are to fix a bunch of stuff in the AL implementation, why not also implement the alcCapture stuff ? Here's what I see:

If we don't

We won't have to write tests for that;
One thing less for the Emscripten folks to review in the future PR;
Nobody has complained anyway (yet);

If we do

We'd be interested in calling ctx.createMediaStreamSource() given navigator.getUserMedia() with only audio enabled.
The latter asynchronously requests the user's permission to open microphone input (which may be then granted or denied), which is not expected by typical OpenAL apps. How do we want to handle this ?
I see these options :
1. alcCaptureOpen() opens a dummy "proxy" device which successfully records nothing until the user allows microphone input, AND still "succeeds" if the user denies mic input;
2. Same as above, except calls start returning errors when the user denies mic input;
3. alcCaptureOpen() fails until mic input is allowed, and we say "If you're on Emscripten, you have to retry several times";
4. A better idea?
AFAIK the only supported format is float32, so to support OpenAL's standard 8-bit/16-bit ints formats we have to some post-processing;
We could grab this or this, or others, as guides.

At this point here are my questions:

Should we do it ?
If so, I'd like to go ahead! TBH I haven't used the Web Audio API before but it looks fun and within my reach. Would that be okay ?
Thoughts regarding the aforementioned "allow/deny" issues (and perhaps other issues I didn't have in mind) ?

jpernst commented 7 years ago

Yeah, I think capture is definitely worth supporting if we can swing it. I've been punting on it for now since there's so much else to do, so if you want to dig into this that would be fantastic.

As for the approach, I think either 1 or 2 is best. As much as possible I'd like to avoid people having to make special considerations for emscripten unless they opt into it. In that spirit, exposing a few emscripten specific extensions is something I've been considering; we could have one like ALC_EMSCRIPTEN_capture_allowed that polls whether or not capture permissions have actually been granted. That's for later though, for now just getting it working at all is more important.

To help avoid possible merge conflicts, since we'll be working in the same file, I'll carve out a section in the internals block for capture stuff, and you can put anything you need to in there. I won't touch anything inside there, or the alcCapture* entrypoints, so we should be pretty safe.

jpernst commented 7 years ago

Ok, the capture section is added, so feel free to dig in whenever. If you need to make changes outside that block, just let me know so we can coordinate.

yoanlcq commented 7 years ago

Alright, sounds good! Also I really like the idea of having an ALC extension so that people who need that kind of control can opt-in - It would call for some documentation though, which is yet another task (or we might just ask kripken et al. what to do about it when proposing the PR).

yoanlcq commented 7 years ago

Some things will require me to operate outside the capture sections :

The $AL object will need a new AudioContext member (possibly named alcCaptureAudioCtx). Or even better, a captureCtx member which would contain an AudioContext AND other internal state;
alcGetString(NULL, ALC_CAPTURE_DEFAULT_DEVICE_SPECIFIER)
alcGetString(NULL, ALC_CAPTURE_DEVICE_SPECIFIER)
alcGetIntegerv(valid_device, ALC_CAPTURE_SAMPLES, 1, &frameCount)

And some concerns :

Since sample rate is a read-only property of an AudioContext and can't be set, that means alcCaptureOpenDevice() could fail, given what are otherwise commonly-supported values like 44100 or 48000.
Worse, nothing requires that the sample rate is the same everywhere, it's just on the browser's whim (or I missed something).
Since this probably violates the caller's expectations, perhaps we should provide an extension function that creates an AudioContext beforehand and returns its sample rate so that the app may correctly handle it. Or I'm overthinking it and correct behaviour could be achieved with a "brute force" approach.
What would be reported as device names ? I'd suggest something along the lines of "getUserMedia() Audio" (accurate, but not pretty).
Also, on browsers that support it, we could instead enumerate devices, but that's kind of a "nightly" feature and gets close to gold plating.

Also, in case we end up providing extensions, we should probably try to fix alcGetProcAddress, looking at how emscripten_GetProcAddress() was implemented for OpenGL.

And lastly, it looks possible (not sure) to implement something similar to (but not the same as) ALC_SOFT_loopback but, again, gold plating.

jpernst commented 7 years ago

Adding members to $AL is fine, since that can also be done inside the capture section. When we get closer to making the PR we can hoist them back up with the other members.

As for alcGetString/Integerv, feel free to add to those, since I probably won't be touching them anytime soon, at least until we get to extensions, which will be a while yet.

I'd give pretty good odds that either 44.1khz or 48kz are supported by most web capture implementations, so brute forcing it probably isn't THAT bad, but it's definitely something to consider for an extension.

As for the device name, right now the output device is just called "Device", so capture could just be called "Capture". I doubt it will matter much in the long run, so feel free to go with what you think is best. I'll probably change the output device name to at least indicate it's a web audio device.

I've taken a look at how the GL implementation does extensions. It's a little wonky, but not too bad. I'll probably be doing that pretty soon, just to get the stubs in place.

Finally, I also looked into implementing ALC_SOFT_loopback, but concluded it's either impossible or too hard right now, since it requires synchronous rendering, while webaudio only provides async. If we want we can swing back and take another look once everything else is done, but it's probably not that big of a deal, since the primary use of loopback is to output audio with a different API, and on emscripten there's no point in doing that.

yoanlcq commented 7 years ago

Okay, thanks!

About loopback though, implementing it as ALC_SOFT_loopback might be hard, but what about (yet) another custom extension that's suited to Web Audio's constraints ? The way I see it (with my limited knowledge), it's a matter of redirecting all output to a single Javascript node (copying data to buffers) which would also happen to output to speakers.

I could see a use case for this: game (or game engines) that want to allow "recording" gameplay footage (perhaps saving them to video files for later replay) even on Emscripten. Yes, there are screen recorders and all the stuff, but providing this in-game would probably make average users happier (this would even work on mobile devices, etc), that kind of thing.

But it does look a bit too involved for what users would make out of it. Perhaps they wouldn't even use OpenAL for such a thing in the first place.

So anyway, I'll try to get the basics working, and hopefully succeed in adding tests.

yoanlcq commented 7 years ago

Small update, the whole sample rate story is a non-issue, since, quoting the spec, "The implementation is expected to convert and resample to [the requested] format on behalf of the application". One error less but some more work.

Also, it looks like "loopback" wasn't the correct term for what I had in mind (which was, "capture of the application's own audio output", and not "system-wide audio output").

jpernst commented 7 years ago

Just as a heads up, I had to make some minor changes to the alcGet* methods to support context attribs and lay the groundwork for extensions.

yoanlcq commented 7 years ago

Small update - Wanted to say I'm still "on it", sorry it's taking quite a while. Turns out "OpenAL's way" of doing things doesn't map very well to the "Web Audio way".

The whole thing has got me running around in circles for quite a while (and I haven't given my 100% on it either), but now there's only a couple solutions left that I need to try and (hopefully) push.

To give a quick idea, it all has to do with AudioContexts being a scarce resource, the "make-everything-asynchronous" mindset of WebAudio and the AL implementation having to perform resampling if necessary w.r.t the sample rate requested by the user.

jpernst commented 7 years ago

No problem, there's still plenty of work to do writing new tests anyway. And if it ends up not working out, that's fine. There's a few extensions I had to drop that seemed like they'd work well but ended up just not being practical.

yoanlcq commented 7 years ago

Finally got the thing working (link to sources if you're curious), along with some tests in C ! Works fine with various frequencies and formats.
What's left to do is integrate the Javascript part to library_openal.js, and move the tests into the existing infrastructure.

Then I'll probably want to add a few extensions (drafts here), and work around Chrome's bad handling of numerous OfflineAudioContexts.

Roughly, my implementation creates and reuses a "shared AudioContext" for streaming microphone input into a ScriptProcessorNode, which onaudioprocess callback creates an OfflineAudioContext for the sole purpose of resampling the input buffer into our own ring buffers. These ring buffers are read from, interleaved, and type-converted from float32 at the time alcCaptureSamples() is called.

That means, if we pick 1024 as the ScriptProcessorNode's "buffer size", then every 1024 sample frames a fresh OfflineAudioContext is created (and expected to be destroyed once it's not used anymore).
Firefox seems to handle this correctly, but Chrome displays an "Aw, snap!" error at some point, so my plan is to add logic to reuse OfflineAudioContexts as much as possible, making (sadly) everything a bit more complex than it already is.

I wish the implementation would have been more "direct", but as far as my experiments go this is the simplest (if not the only) way to make things work.

Also, when the time is right, it would be nice to discuss the extensions I'm suggesting (and how to pick actual values for new enums?).

jpernst commented 7 years ago

Cool, good to hear things are working! I haven't looked at the code yet, but my first concern would be that ScriptProcessorNode is deprecated in the latest drafts of the spec. I believe AudioWorklet is the replacement, but I've only looked at it briefly so I'm not sure what the full implications are. As for the OfflineAudioContexts, I guess I don't understand why there needs to be more than one per capture device. Couldn't data jus be continuously streamed through the same one rather than creating new ones? Instantiating contexts has a pretty heavy cost so we should try to avoid that if possible.

As for choosing extension values, I've been wondering that as well. I've been meaning to discuss it with the author of OpenAL-Soft, but haven't got around to it yet. I've been busy with other things lately, but hopefully I'll get back to working on this in the next few days.

yoanlcq commented 7 years ago

(Huge wall of text ahead, my apologies :) )

I share these concerns - I'll try to explain the situation as best as I can.

ScriptProcessorNode is indeed "deprecated" since 2014 and said to be "soon replaced by Audio Worker", but the thing is, it's 2017 and no browser vendor seems willing to make the first step.
Because Audio Workers are not yet a thing, there's nothing I can do (not even mentioning that Web Workers bring a whole other set of problems anyway).

In all cases, these guys seem to be in a constant deprecation rampage anyway - nearly every action taken by my scripts have one "modern" way and one "deprecated" way to do the exact same thing in ways that slightly differ (typically, "callbacks vs. promises").
I always support both whenever I can, but this has got to stop. Even ScriptProcessorNode was JavaScriptNode before they decided to rename it for whatever reason.

As for OfflineAudioContexts: My very first try was to do what you suggested, however, it just so happens that we can't stream media input through one of these. I see no reason for this behavior, that's just how it is.
Chrome blindly allows this but outputs garbage, and Firefox completely removed the createMediaStreamSource() method from OfflineAudioContexts altogether (I believe we can still see comments of surprised people on the relevant Bugzilla).

So at this point the only option is to use (waste?) a regular AudioContext for streaming input data, but if the requested sample rate for the capture device differs from the AudioContext's (which is read-only and unpredictable (uugh guys)), we have to perform resampling, and the sanest way to do that is to leverage OfflineAudioContexts (because internally, browsers use way better resampling algorithms than I think I could ever implement).
So this is what the ScriptProcessorNode is for - it receives chunks of data from the AudioContext (referred to as "processing" later on) and somehow feeds them to OfflineAudioContexts for resampling purposes (referred to as "rendering" later on).

Then we have more problems : processing and rendering both always happen asynchronously, and as a result we can't expect them to execute in any particular order (though we can roughly guess).

That's not all: We also can't just reuse a single OfflineAudioContext for our purposes, because restarting its rendering process from within its oncomplete callback is disallowed. Again, why would anybody want that, I have no idea.

With all of this mind, I settled with creating a dedicated OfflineAudioContext for every newly-received chunk of data, and hoping that the browser is smart enough to notice that there are no more references to them once they're done rendering to our ring buffers. Firefox is like "Yeah fine", while Chrome seemingly crawls under a mountain of Zombie OfflineAudioContexts.
Yes, I nearly forgot to mention that there is no way to explictly release OfflineAudioContexts, whereas AudioContexts do have a close() method. Which is why, at the moment, I'm considering managing a pool of OfflineAudioContexts instead, to reuse them in a clever way, but hopefully one can now understand why I'm not very confident about this.

Also, while AudioContexts are a very scarce resource (max. 6 in Chrome, but some people say we shouldn't assume more than 1), OfflineAudioContexts are not (one can create thousands of them with no problem).
Since we do want to preserve resources as much as possible, I'd like our final OpenAL implementation to use only one AudioContext for all ALContexts and capture devices at once, because it's easily doable (if not done already).

So all in all I can't say I'm fine with the current solution, but hopefully you saw how the Web Audio API didn't miss a single opportunity to get in my way. When I'm done with all of this, I'm so, so going back to Rust, where at least API designers care about my sanity. ;)

Edit: Also, about constants for extensions, I genuinely think that nobody would care if we picked values in a half-random way. OpenAL is just not that popular and used, and neither is Emscripten (let's be honest).
And while the spec claims there is some sort of process for registering extensions, I seriously doubt that it still applies or that anybody actually cares. It's just not the same level of seriousness as OpenGL.
Even in OpenGL, some extensions such as GLX_MESA_swap_control are not registered, but this doesn't magically inhibit their existence nor actual use of them.

That said, if we can get things done the "official way" without losing too much time, then yeah, let's go for it.

jpernst commented 7 years ago

Ok, that makes more sense now, thanks for the detailed explanation. If an OfflineAudioContext can't restart itself in its own callback, perhaps you could make 2 of them, and have them restart eachother in alternating fashion.

Right now each ALcontext has its own AudioContext, but that can be changed relatively easily. I was going to preserve the ability for different contexts to have different sampling rates, but the spec is very vague about context attributes and whether they are specific to a context or shared by all device contexts. The latter is what would allow us to share AudioContexts, and it's what OpenAL-Soft does anyway, so I'll add that to my TODO list.

yoanlcq commented 7 years ago

perhaps you could make 2 of them, and have them restart each other in alternating fashion

I think I'm going to try something like this first, thanks for the idea! Again, I'm still not very confident about this because I have a hard time reasoning about control flow and possible races caused by the omnipresent asynchronicity.
I'll see where it leads me, but I'll drop it if it ends up being nightmarish (the whole thing was already a bit too close to this until this evening anyway).

Also, when I think about it, it's probably not that big a deal if we don't end up sharing AudioContexts. Which OpenAL applications ever use more that one context ? Pretty sure that the few kinds of applications which need this are at a level where they wouldn't even consider using OpenAL.

Wasting resources is indeed bad, but going to great lengths just to work around problems we are not responsible for is not any better IMO.
I trust we'll do the right thing, but we also shouldn't throw too much of our time on problems that aren't worth it. At the end of the day all I wanted was to compile a trivial Rust audio app on Emscripten. x)

yoanlcq commented 7 years ago

Okay, some simple tests led me to conclude that OfflineAudioContexts are never reusable. It's easy to see by trying this in the browser's console :

var o = new OfflineAudioContext(1, 1024, 44100);
o.startRendering().then(function(e) { console.log(o.state); })
// "closed"
o.startRendering()
// Promise { <state>: "rejected", <reason>: InvalidStateError }

OfflineAudioContexts seem to be meant to be thrown away once they're done rendering (there's no way to move them out of that "closed" state), so that settles it for me.

Also according to this (from 2012, but for some reason I believe it's still relevant), Chrome doesn't even seem to have garbage collection anyway (which is astonishing).

So right now I fail to see any reasonable way to "fix" the situation.

The only workaround that comes to my mind would be to implement a resampling algorithm on our side, removing the needs for OfflineAudioContexts altogether. The thing is, this doesn't seem easy to do (but maybe it actually is, idk), especially since I'm not exactly a maths guy.
So what I'm gonna do next is see if I can pull off something like this in a way that doesn't increase complexity by a crazy amount.

jpernst commented 7 years ago

Honestly we could probably get by just fine with a linear resampler. It's pretty simple and the results should be acceptable for most use cases, particularly since captured audio is usually pretty low quality anyway. Also, keeping in mind that the chosen capture rate will almost always be one of a very few values, most likely 44.1K or 48K, so in many cases resampling probably won't be required anyway. If someone later wants to improve it, then that's fine, but I don't think we need to go all out for this.

yoanlcq commented 7 years ago

You're right, this is reasonable. Guess I'll do it this way.

yoanlcq commented 7 years ago

Alright, the implementation and tests have made it to this repo, and work as intended (hopefully. I did test it on Firefox and Chrome). There's room for improvement, but not so much (for instance, I should recycle unused indices in the internal array of capture devices and associated data).

I plan to tackle these kind of small "improvements" next, but right now we've got something working! As you said, linear resampling seems to do a fine job.

jpernst commented 7 years ago

Awesome, nice work! For the capture device indices, I suggest hooking into the ID mechanism I've been using for other resources. You can call AL.newId() to obtain a new unique ID number to use for the capture device, then when you delete if, just do AL.freeIds.push(id) to make it available for reuse.

yoanlcq commented 7 years ago

Well I think I'm done with the capture implementation. I was planning to implement my extensions proposal next, then properly formatting in into an RFC and sending to the OpenAL mailing lists, but I realized they didn't actually bring much value.

alcCaptureRequestDeviceEMSCRIPTEN() is not needed because :

chunksize won't have any impact on quality, since resampling occurs at the last minute when samples are captured;
The on_success and on_error callbacks bring unnecessary complexity :
- If subsequent calls fail with ALC_INVALID_DEVICE, then we know it's because the user has denied access;
- If the reported number of sample is zero and doesn't change, then we know the user hasn't yet granted or denied access;
- Otherwise, when it works, we know the user has granted access.

Also, the two integer queries ALC_CAPTURE_BEST_FREQUENCY_EMSCRIPTEN ALC_CAPTURE_BEST_FORMAT_EMSCRIPTEN are nearly useless because the former is, 90% of the time, 44100 or 48000, and the latter is actually always AL_FORMAT_MONO_FLOAT32.

So I'm going to close this - feel free to reopen if you think parts of these extensions are worth pursuing, or if there is any issue with the capture implementation.
Otherwise, I'll be looking forward to the pull request. :)

jpernst commented 7 years ago

I dunno, I still think such an extension has merit. I don't find the fact that polling the status could result in an infinite loop to be sufficient justification to abandon it. This is an emscripten specific extension, so anyone using it would need to be aware of emscripten specific semantics. Also, unless I'm mistaken, it's possible to block forever anyway if the user enters a tight loop waiting for capture samples that will never arrive until they yield back to the event loop.

yoanlcq commented 7 years ago

Agreed, but even, say, the hypothetical polling extension's problem is already solved by the very behavior of our implementation (which doesn't do anything nonstandard expect maybe for the "no sample is captured until user grants access" part).

The current impl provides enough information such that applications could infer the "status" value they're after ("Pending", "Granted", "Denied") by themselves, in a way that's reasonably not hackish (described in my previous comment).
I don't expect apps to particularly care that no sample is being captured even a short while after the device has been "opened" (they would just keep polling ALC_CAPTURE_SAMPLES naïvely), and also I expect them to be prepared for unexpected ALC_INVALID_DEVICE failures at any time (semantically equivalent to unplugging the physical device).

At worst, app authors would be surprised at first, then quickly understand how it works and how they can handle the situation.

Which reminds me that our capture implementation's behavior should be documented somewhere along with useful related information. Under the site folder probably - I'll look into it.

smallStonela commented 6 years ago

I call the "alcCaptureOpenDevice" of OpenAL API,it reports a error,that, (ALCopenslCapture_open: bufferQueue->Enqueue: Buffer insufficient). I looked at the OpenAL source code /ALc/backends/opensl.c . The function ALCopenslCapture_open(...),that run to below of ll_ringbuffer_get_write_vector(self->mRing, data) .It was reported a error .emmm,this Array runned seventh what reported it.I didn't know how i can solve thsi error.please help me. thanks.

yoanlcq commented 6 years ago

@smallStonela I may be wrong but this doesn't look like the problem is on our side; OpenSL is the Android backend, and I couldn't find any of the functions you mentioned in this repo.
Could you give more context, like on which kind of device (i.e native mobile app ? browser app on mobile ? etc) you've encountered these errors ?
Either way it's likely that this place isn't the one you're looking for. :)

jpernst / emscripten

Implementing the alcCapture* API ? #2

If we don't

If we do