kcat / openal-soft

OpenAL Soft is a software implementation of the OpenAL 3D audio API.
Other
2.11k stars 521 forks source link

Echo Cancellation with OpenAL #683

Open monaghanwashere opened 2 years ago

monaghanwashere commented 2 years ago

Hello,

Is there a suggested approach to incorporating Echo Cancellation with OpenAL ? I have an Echo Cancellation SDK, but the functions need to be fed with two buffers of audio data every frame, namely 1) the microphone input audio stream, and 2) the speaker output audio stream.

For the mic input stream, I'm guessing I can just use alcCaptureSamples. For the audio output stream, I'm not as certain what the best approach is. My first instinct is to open both a loopback device and a hardware device, grab rendered samples from the loopback, pass the rendered samples to the echo cancellation api, and then pass the returned echo-cancelled samples to the hardware device (since they still need to be played out).

However I'm concerned that this 1) will introduce a bunch of latency, but I guess more importantly, 2) is happening at too 'high' a level in the audio chain, and that it's probably best to grab the audio data from the driver level instead. Ideally I would want this solution to work cross platform (Windows, OS X, Android, iOS), and since my audio engine has a common OpenAL layer across all platforms at the moment, it would be great to be able to fit the echo cancellation part into OpenAL, as opposed to writing custom solutions for each layer (I think it might not even be possible on Android/iOS because I think they may not offer APIs to grab audio data at the driver level).

Any pointers would be greatly appreciated, thanks

kcat commented 2 years ago

I don't know about OSX, but Android, iOS, and Windows currently won't be able to do echo cancellation on its own. The main issue being, it would rely on being able to capture the output of playback, which is typically handled by opening a "monitor" or readback capture device. But the backends for those OSs don't expose the monitor capture devices for the app to open. Grabbing the output samples from OpenAL's output before they're given to the system isn't really possible, and it wouldn't be a good option if it was because the system will likely do additional processing to it that may interfere with the echo cancellation processing (the output/source samples won't match the input/mic samples' echoes, and it may not properly recognize them as echoes).

You could use some OS-specific code just to handle the monitor capture (or maybe SDL as an alternative option for cross-platform capture, if you don't mind adding that, though I don't know if SDL handles monitor/readback capture devices either), but use OpenAL for the normal/mic capture and playback. Or maybe SDL for both captures, and OpenAL for playback. The monitor capture samples would be the output samples that are played on the speakers, while the normal/mic capture would be the input samples that may have ended up with echoes of earlier samples. Supply both of those to the echo cancellation API and you should get back the normal/mic capture samples with any echoes removed, then you can play/stream it with an OpenAL playback device.

mirh commented 2 years ago

FWIW windows has apis (and apos /s) to implement echo cancellation. Not sure then if they have anything to do with the purposes here.

monaghanwashere commented 2 years ago

@kcat I'm a little confused by a few points in your answer

But the backends for those OSs don't expose the monitor capture devices for the app to open

If this is true, how can SDL help ?

Grabbing the output samples from OpenAL's output before they're given to the system isn't really possible

Really? Are you saying that the loopback rendered output is not the output from OpenAL ? (I understand the part about the audio likely undergoing additional processing from the time OpenAL hands it to the OS, to the point its output on the speakers; this question is specifically about the 'grabbing output samples from OpenAL' part not being possible)

kcat commented 2 years ago

If this is true, how can SDL help ?

I mean OpenAL Soft's backends don't expose monitor capture devices. It's possible SDL might. I intend to add the ability eventually, but it's not there yet.

Really? Are you saying that the loopback rendered output is not the output from OpenAL ? (I understand the part about the audio likely undergoing additional processing from the time OpenAL hands it to the OS, to the point its output on the speakers; this question is specifically about the 'grabbing output samples from OpenAL' part not being possible)

Loopback devices are separate from playback devices. You could get the samples from a loopback device, but OpenAL won't play those (unless you use a second device to stream it, which is possible though not elegant, or use the system audio API directly). On a normal playback device, you can't get the samples that are given to the system.

monaghanwashere commented 2 years ago

Loopback devices are separate from playback devices. You could get the samples from a loopback device, but OpenAL won't play those (unless you use a second device to stream it, which is possible though not elegant, or use the system audio API directly).

Yes, exactly, I had described this very scenario in my original post:

My first instinct is to open both a loopback device and a hardware device, grab rendered samples from the loopback, pass the rendered samples to the echo cancellation api, and then pass the returned echo-cancelled samples to the hardware device (since they still need to be played out)

The 'passing of the echo-cancelled samples' to the 'hardware device' would have been a second device that I would have opened on OpenAL. Using this approach, is the only caveat you foresee, the fact that those samples would be missing whatever extra processing the system will likely do on it?

monaghanwashere commented 2 years ago

Following up here. Any foresight on the following?

Using this approach, is the only caveat you foresee, the fact that those samples would be missing whatever extra processing the system will likely do on it?

Or, asked another way, do you have any insight into what kind of processing the audio would undergo between the time the rendered loopback samples are passed to another OpenAL device for playback, and that audio being played back on the actual speakers?

kcat commented 2 years ago

Or, asked another way, do you have any insight into what kind of processing the audio would undergo between the time the rendered loopback samples are passed to another OpenAL device for playback, and that audio being played back on the actual speakers?

Likely at least some volume adjustment. There could also be equalizers, or virtualization (e.g. using HRTF to simulate surround sound on headphones), or even a system-level echo cancellation module. So what's captured as feedback from the microphone may not match up with what you played, causing the echo cancellation processor to not think it's the same sound being echoed back.