kcat / openal-soft

OpenAL Soft is a software implementation of the OpenAL 3D audio API.
Other
2.1k stars 520 forks source link

Seperate processing of low frequencies for setups where #subwoofers < #channels #463

Open Hiradur opened 3 years ago

Hiradur commented 3 years ago

I recently read an article [1] which claims that for accurate stereo music reproduction 2 subwoofers are necessary because if only one is used the low frequency content of both channels is mixed together, potentially causing destructive interference due to interchannel phase differences.

This made me wonder: Would it make sense to offer an option for users where the number of subwoofers is smaller than the number of output channels to do separate mixing for the subwoofer channel(s)? What I have in mind is that the user sets the number of subwoofers he has, the crossover frequency for the subwoofer(s) and possibly the steepness of the crossover in the configuration file and OpenAL Soft would then do separate mixing for low frequencies taking into account the number of subwoofer channels to avoid destructive interference.

I don't know how modern AVR with 2 channels for subwoofers handle bass management with 2 attached subwoofers, e.g. if they automatically map the signal from the left front channel to one subwoofer and the signal from the right front channel to the other. This could potentially make complex configuration options for surround systems with subwoofers of a number of more than 1 but less then the total number of channels necessary. Although I think for PC the most common use case would be 2.1, 5.1 and 7.1 systems.

[1] https://www.kenrockwell.com/audio/stereo-subwoofers.htm

kcat commented 3 years ago

Currently OpenAL Soft doesn't generate any subwoofer signals. Any subwoofer signal is provided directly by a 5.1, 6.1, or 7.1 source's LFE channel (or a AL_EFFECT_DEDICATED_LOW_FREQUENCY_EFFECT effect). A stereo source is mixed and played with the full frequency range, so any stereo bass in those two channels are passed through to the speaker system as-is, and according to what I've read on the subject, it should be left to the speaker system to generate extra subwoofer signals from the main speakers, however many there may be.

Regardless, the biggest issue with this would be that most sound systems have no concept of multiple LFE channels, so they can't be fed separately by programs. Even Dolby Atmos and Windows Sonic only recognize one LFE channel, so multiple subwoofers can only be properly driven by external hardware. At best, OpenAL Soft would be able to do a frequency analysis of the main channels, cut the low frequencies from them, and mix the low frequencies to the one LFE channel with aligned phase (to ensure no destructive interference). But even then, it seems to me that could be better served by a pulseaudio or wasapi plugin or something. It would also likely cause notable latency since the frequency analysis and phase-aligned mixing would depend on FFTs, which need a fair amount of samples to work on.

Hiradur commented 3 years ago

I didn't intend for the LFE channel to be used for this. Perhaps I did not make my intention clear enough so I'll rephrase with an example.

Suppose you have a 2.1 system. Let's say the crossover frequency is 80 Hz and let's say the crossover filter has infinite steepness, i.e. there is a hard cut between the speakers and the subwoofer at exactly 80 Hz.

A user would enter this data into the configuration file and OpenAL Soft would only output two channels in the end. However, frequencies at and above 80Hz would be processed like regular stereo 3D audio is processed and frequencies below 80 Hz would be processed differently. Since frequencies below 80 Hz are reproduced by only one subwoofer OpenAL Soft would treat these frequencies as if they were output to a single (mono) channel.

To summarize:

I think this approach would be more powerful than letting the playback gear handle the bass management. Or rather, the playback gear would still apply bass management to get the signal for the subwoofer but the mix it receives would be different. The purpose would be to avoid cancelling of frequencies < 80 Hz caused by differences in phase of the two stereo channels. OpenAL could take this into consideration for both 3D audio and EFX reverb (and possibly other?) effects. I mention EFX reverb effects here because the aforementioned article claims that the reproduction of real stereo recordings more accurately represents the concert hall the recoding session took place in with 2 subwoofers. I'm not sure if possible interchannel phase differences could be avoided by playback gear since the mix is already done.

Hopefully it is more clear what I meant this time. This is what I had in mind when I created the issue, not sure if this is possible, practical or even useful.

Right now this idea is solely based upon the claims in the aforementioned article. I don't know if they are correct (i.e. scientifically proven) but I wouldn't be surprised if they were. Anyway, I hope you already have more knowledge about the topic than I do and can judge the proposal accordingly.

kcat commented 3 years ago

To summarize:

  • OpenAL Soft's final output would be a 2.0 stereo signal
  • frequencies >= 80 Hz would be processed for stereo output
  • frequencies < 80 Hz would be processed as if the output was mono but mixed into the stereo signal

I think this approach would be more powerful than letting the playback gear handle the bass management. Or rather, the playback gear would still apply bass management to get the signal for the subwoofer but the mix it receives would be different. The purpose would be to avoid cancelling of frequencies < 80 Hz caused by differences in phase of the two stereo channels.

The problem is ultimately with mono mixing. When two different sounds with mismatched phase on a given frequency band are added together, that frequency gets attenuated/cancelled. For instance if one sound has samples

0.01 0.02 0.03 0.04 0.05 ...

and another sound has samples

-0.01 -0.02 -0.03 -0.04 -0.05 ...

When mixed to a mono signal, it results in a series of 0s, silence. Notably, this is physically accurate behavior; if you had two speakers in very close proximity to each other playing these two sounds, it would sound very quiet, whereas if one speaker was on your left and the other on your right, it would be more audible.

If you want to keep the frequencies from attenuating when mixing to mono, it's going to be much more involved. You essentially need to ignore the phase in all but one sound, while still adding the magnitudes together. That would involve FFTs and associated latencies.

Hiradur commented 3 years ago

I'm not sure if we are talking about the same here. This may very well be a lack of understanding on my side however. I only have a rough idea of how 3D audio works.

I'll try to illustrate what I mean once more with a more practical example.

Let's imagine we want 3D audio on a stereo 2.0 system. From my understanding, in order to create the illusion of sounds originating from outside the speakers both amplitude panning and phase shifting tricks are used. I think the latter is important to create the illusion of sounds coming from either the sides or the rear. Now, let's expand the stereo system with a subwoofer to a 2.1 system with bass management. Like in the earlier examples the subwoofer plays all frequencies below 80 Hz and the speakers play all frequencies at and above 80 Hz. Furthermore, let's assume a sound is played in 3D from the rear right position, illustrated with an x in the following illustration. L and R are the position of the front speakers, the subwoofer is not shown but could be placed anywhere and @ is the position of the listener.

L........R
..........
....@.....
..........    
.........x

So here is what I imagine will happen: The part responsible for the bass management will add together the audio signals for the left and right channels to get a mono signal. It will then apply a low pass filter to get a signal suitable for the subwoofer. Since the sound is coming from the rear right there must be some phase difference between the signals of the left and right channels since we are talking about a 3D mix. When both channels get added together to form a mono signal there will probably be some destructive interference caused by these interchannel phase differences, i.e. some frequencies of the original sound are not/only partially present in the mono signal for the subwoofer. This is where I thought accounting for a subwoofer at the 3D audio mixing stage, i.e. preventing destructive interference in the frequency range of the subwoofer by doing the "3D" mix for the subwoofer frequencies in mono rather than having the bass management derive a mono mix from a stereo mix later on, would help.

Ultimately, I thought this could be a workaround in software for a limitation caused by certain hardware setups. However, if the destructive interference in the frequency range of the subwoofer caused by deriving a mono mix from a stereo 3D mix is physically accurate behaviour then I'm ok with that.

kcat commented 3 years ago

From my understanding, in order to create the illusion of sounds originating from outside the speakers both amplitude panning and phase shifting tricks are used. I think the latter is important to create the illusion of sounds coming from either the sides or the rear.

Not with plain stereo 2.0 mixing. It just uses simple pan-pot/amplitude panning for left-right positioning, with a slight gain reduction for rear sounds (there's not really enough speakers to do any distance-related encoding, short of using out of phase echoes to cancel comb-filter effects and generate a headphone-like response over the air, which can then apply HRTF; but that's not very practical).

With 2-channel UHJ encoding, however, it does use phase shifting for sounds coming from behind, so that would be subject to phase cancellation for rear sounds when played over a stereo+subwoofer system. Although, UHJ is mostly intended for where a surround sound receiver is used, and the encoded signal is decoded back to surround sound without the phase shift.

With near-field compensation/emulation enabled over surround sound, amplitude differences in the lower frequencies are applied to the various speakers to improve the reconstructed wave front, however I don't believe this results in any significant loss (any reduction or inversion in some speakers would be matched by a boost in others, keeping the overall level consistent).

The biggest "threat" for low-frequency cancellation on a subwoofer is when 2+ sound sources are mixed together, a single panned source shouldn't be a problem.

Hiradur commented 3 years ago

Not with plain stereo 2.0 mixing. It just uses simple pan-pot/amplitude panning for left-right positioning, with a slight gain reduction for rear sounds

Interesting, this comes as a surprise to me as I found the localization to the sides and rear with stereo speakers to be quite effective in many cases so I thought it would be more involved than simple amplitude panning. I assume that the localization is enhanced by additional cues such as the movement of sounds and doppler effects because on music recordings I almost never localize any sounds to the sides or rear.

(there's not really enough speakers to do any distance-related encoding, short of using out of phase echoes to cancel comb-filter effects and generate a headphone-like response over the air, which can then apply HRTF; but that's not very practical).

Is this comparable to Ambiophonics [1]? I stumbled upon it a few years ago and always wondered if it could have benefits for 3D audio over stereo speakers. Admittedly, I never tried it myself.

Although, UHJ is mostly intended for where a surround sound receiver is used, and the encoded signal is decoded back to surround sound without the phase shift.

This is off-topic but would a surround receiver with a Dolby Pro Logic 2 decoder be sufficient or would one need a specialized Ambisonics UHJ decoder?

The biggest "threat" for low-frequency cancellation on a subwoofer is when 2+ sound sources are mixed together, a single panned source shouldn't be a problem.

Separate processing of low frequencies as I described before is unable to solve this problem or unfeasible due to other reasons?

[1] https://www.ambiophonics.org/

kcat commented 3 years ago

Interesting, this comes as a surprise to me as I found the localization to the sides and rear with stereo speakers to be quite effective in many cases so I thought it would be more involved than simple amplitude panning. I assume that the localization is enhanced by additional cues such as the movement of sounds and doppler effects because on music recordings I almost never localize any sounds to the sides or rear.

If you have UHJ or HRTF enabled over speakers, you might be hearing effects from that. Or, reverb and doppler effects can enhance the spatial perception of sound. Or, even though the rear sound attenuation is subtle, it would be enough for the brain to pick up on it. Maybe some combination thereof.

Is this comparable to Ambiophonics [1]? I stumbled upon it a few years ago and always wondered if it could have benefits for 3D audio over stereo speakers. Admittedly, I never tried it myself.

Other than the fact that it uses noise cancellation to reduce inter-ear comb effects, it doesn't seem to be. At least from what I can tell, ambiophonics uses an "ambiopole" (two speakers positioned closely together, designed to not have any crosstalk) to create a ~150-degree dry front stage from a stereo input, then a separate path runs the input signal through a high-quality real-time reverberator that feeds surround sound speakers placed about the room. Though maybe in a pinch, it could use virtual reflection points that feed into the ambiopole instead (sacrificing the side and rear reverberation, at the benefit of not needing extra speakers), but that's just a guess.

Either way, I don't think there would be much benefit for 3D audio like what OpenAL Soft deals with. Ambiophonics seems designed for a fixed environment and front sound stage, as it falls apart when you have sound sources coming from behind. I don't think a dynamic reverb environment with randomly placed and moving sound sources would work well over an ambiophonics system.

Though it might be an interesting experiment to play an ambiophonics mix through OpenAL Soft's ambisonics mixer, using EFX to generate the reverberation and the stereo_angles extension to create the wide front stage from the stereo input. Though it would probably need some special handling for the dry path mix to reduce L/R crosstalk.

This is off-topic but would a surround receiver with a Dolby Pro Logic 2 decoder be sufficient or would one need a specialized Ambisonics UHJ decoder?

IIRC, a Dolby surround sound decoder can work well enough to play a UHJ signal, though a specialized UHJ decoder would be more ideal. The reverse isn't true though, you wouldn't get satisfactory results with a UHJ decoder playing a Dolby signal. Or maybe it's the other way around, but I think that's right.

Separate processing of low frequencies as I described before is unable to solve this problem or unfeasible due to other reasons?

It would require a bit more work than acting as if low-frequencies go through a mono mix. To avoid destructive phase interference, it would depend on accounting for phase in the input signal, which would depend on separating the lower frequencies' magnitude from the phase, allowing the magnitudes to be added together without the phases interfering. That would require FFTs, which aren't the cheapest and introduce latency. It's doable, but not the most practical for real-time use given the complexity, cost, and latency.

Hiradur commented 3 years ago

If you have UHJ or HRTF enabled over speakers, you might be hearing effects from that.

I don't have any additional processing enabled for stereo 3D audio, so no HRTF or UHJ (they also show up as disabled in the log file).

Could inter-aural time differences be simulated with stereo 3D audio to enhance localization or is this not doable with stereo speakers?

IIRC, a Dolby surround sound decoder can work well enough to play a UHJ signal, though a specialized UHJ decoder would be more ideal.

Since Dolby Pro Logic Decoders are probably far more widespread than UHJ decoders would it be possible to encode the signal in Dolby Pro Logic or wouldn't this work well with the ambisonics rendering or would it be hold up by patents or lack of documentation (I think the Darkplaces FOSS engine offers Dolby Pro Logic encoding)?

It's doable, but not the most practical for real-time use given the complexity, cost, and latency.

I see. Thanks for all the clarifications and explanations.

kcat commented 3 years ago

Could inter-aural time differences be simulated with stereo 3D audio to enhance localization or is this not doable with stereo speakers?

Using some aforementioned noise/echo cancellation (or an actual physical barrier in between the speakers in front of the listener, to block the left speaker sound from reaching the right ear and vice-versa) to remove the comb filter effects, you'd be able to control the timing and levels of a sound reaching each ear, to create the perception of a wider stereo image. But it'd only work with a properly-placed listener (the stereo field would collapse if your head moves too far).

Since Dolby Pro Logic Decoders are probably far more widespread than UHJ decoders would it be possible to encode the signal in Dolby Pro Logic or wouldn't this work well with the ambisonics rendering or would it be hold up by patents or lack of documentation (I think the Darkplaces FOSS engine offers Dolby Pro Logic encoding)?

Lack of documentation is the main thing... given how long UHJ and Dolby Pro Logic have been around, I doubt any potential patents would still be valid. But I'm also not sure it would offer much benefit for an ambisonics mix; at least from what I know, Pro Logic encodes distinct 4.0/5.0/5.1 speaker feeds to 2 channels (which can be decoded back to 4/5 channels), whereas UHJ encodes a continuous 2D ambisonic soundfield (which can be decoded back to 2D ambisonics, then decoded to whatever speaker setup the user has). Depending on how exactly pro logic en/decodes its signals, first decoding ambisonics->5.1 then en/decoding that through pro logic, may have comparable results to encoding ambisonics->UHJ then decoding UHJ with pro logic. It would need some testing.

Hiradur commented 3 years ago

I tried Ambisonics UHJ encoding in OpenAL Soft combined with Dolby Pro Logic 2 decoding (I selected Movie mode since the receiver doesn't offer Game mode) with a 4.0 surround system in 5.1 ITU layout. My impression from perhaps half an hour of gameplay: It worked well most of the time. It felt like the surround effect suffered a bit when lots of ambient noises were playing but if that wasn't the case the surround effect was very good. Sound effects even moved smoothly between speakers.

Under these circumstances it might not be worth it to create a dedicated DPL 2 encoder for OpenAL Soft. FWIW I wanted to mention I found a german page showing block diagrams and circuit diagrams of a homemade Dolby Pro Logic 1 encoder [1] although the author notes that he is unsure if it is entirely correct and that he only simulated it in software. Unfortunately, it seems that a Dolby B Encoder is part of the Dolby Pro Logic encoding process. A homemade circuit for a Dolby B encoder is shown further down the page. Again, the author notes it is based on incomplete information and it contains some simplifications. I also found a homemade DPL 2 decoder [3].

[1] https://www2.ak.tu-berlin.de/~fhein/Alias/wwwlogic/Technik1/Encoder.html [2] http://www.niell.org/analog/Analog.html