[Feature request] Stereo upmix virtualization

kcat / openal-soft

OpenAL Soft is a software implementation of the OpenAL 3D audio API.

Other

2.17k stars 526 forks source link

[Feature request] Stereo upmix virtualization #935

Open ThreeDeeJay opened 10 months ago

ThreeDeeJay commented 10 months ago

Even though it's no replacement for proper surround virtualization, stereo to surround upmix virtualization has really grown on me, especially after finding a matching HRTF and applying some tweaks to it. So it would be nice if OpenAL Soft upmixed stereo sources before virtualizing them, like how HeSuVi does it, which from what I understand is very simple without introducing artifacts or latency: it uses channel inversion to isolate the center (usually vocals/dialog) for FC then remove it from the original stereo signal (so it works best with hard panned and lossless tracks) to generate left and right channels that get panned to the side/rear channels where there's usually less positional activity so the front wouldn't be so crowded.

Goofy mockup aside, here's the EqualizerAPO config (matrix.txt) that does the upmixing (should also work with real surround speakers):

Copy: L=0.5*L R=0.5*R C=0.2*L+0.2*R SUB=0.0 RL=0.3*L+-0.2*R RR=-0.2*L+0.3*R SL=0.45*L+-0.25*R SR=-0.25*L+0.45*R

This obviously isn't perfect, though. There are some downsides like more diffuse positioning compared to virtual stereo and possibly artifacts if the HRIR isn't synced up, though I think makemhr might take care of it. In fact, I wish it could export raw IRs for surround channels to use in HeSuVi. But anyhow, I think the best way to implement this would be as a flag in alsoft.ini to set the stereo source behavior:

Unaltered (non-binaural stereo passthrough, which I think is how ALchemy behaves and for those who don't like HRTF applied to music)
Stereo virtualization (current default)
Stereo to upmix virtualization

kcat commented 10 months ago

OpenAL Soft has something like this already, though it's not an ini setting to apply to all stereo sounds. It's some source flags the app needs to set, as it alters the behavior of the sound.

AL_DIRECT_CHANNELS_SOFT from AL_SOFT_direct_channels/AL_SOFT_direct_channels_remix plays non-mono sound unaltered, except for some simple channel mixing in the latter case where an output channel is missing. This blocks AL_EXT_STEREO_ANGLES and AL_SOFT_source_spatialize from working on the given source, since they require being able to pan around the individual buffer channels.

AL_STEREO_MODE_SOFT set to AL_SUPER_STEREO_SOFT, from AL_SOFT_UHJ, provides a kind of upmix where it effectively wraps the stereo soundfield to the front of a B-Format source, along with AL_SUPER_STEREO_WIDTH_SOFT specifying how wide the stereo soundfield should be (from 0 being a point in front, to 0.7 putting the sides a bit behind). This also blocks AL_EXT_STEREO_ANGLES from working on the given source, but can use AL_ORIENTATION to rotate the soundfield like other B-Format sources.

Making a config option to have stereo sources behave like that by default would be tricky. For one, it would rely on the user being aware that it can cause misbehavior in apps that use AL_EXT_STEREO_ANGLES or AL_SOFT_source_spatialize, so shouldn't be something set globally for all apps. And secondly, with Super Stereo, the source behavior depends on other source and listener properties that the app otherwise wouldn't have to worry about (e.g. if the app leaves AL_SOURCE_RELATIVE off for a stereo sound, it normally wouldn't be affected, but with Super Stereo upmixing it will rotate based on the listener orientation). OpenAL Soft would have to be aware of whether the source is intended to be upmixed by the app, or forced by the config file, to make it behave differently.

ThreeDeeJay commented 9 months ago

Dang it, sorry I must've missed the notification. Anyhow, these settings (particularly plays non-mono sound unaltered) do seem like they'd be useful if they could be forced, but yeah, it's hard to predict side effects if forcing them. They'd definitely need to be app-specific, and tucked away in the documentation for those who read the disclaimers before enabling it, even if it can cause obvious unintended issues, like inverting axes via the INI.

For upmixing, the super stereo/UHJ approach seems quite interesting. Would you happen to have an audio recording? Though I'm a bit concerned simply wrapping a stereo signal into ambisonics wouldn't yield satisfactory results because the upmix I have in mind requires to splitting a (mainly for music) stereo signal into 3 (or more) channels and then virtualize that (I'd imagine some sort of head-locked virtual surround would be the way to go to prevent degrading positional accuracy if using ambisonics, but it'd also be interesting if the upmixed channels could be turned into objects that aren't fixed to the listener's rotation). 🤔

kcat commented 9 months ago

For upmixing, the super stereo/UHJ approach seems quite interesting. Would you happen to have an audio recording?

Of anything specific? Do you want it as a 7.1, 5.1, or binaural mix?

Though I'm a bit concerned simply wrapping a stereo signal into ambisonics wouldn't yield satisfactory results because the upmix I have in mind requires to splitting a (mainly for music) stereo signal into 3 (or more) channels and then virtualize that (I'd imagine some sort of head-locked virtual surround would be the way to go to prevent degrading positional accuracy if using ambisonics, but it'd also be interesting if the upmixed channels could be turned into objects that aren't fixed to the listener's rotation). 🤔

Super Stereo essentially changes a stereo sound into B-Format, which means it's not tied to a particular number of discrete channels and can be mixed for anything that has an appropriate decoder. The width can be narrowed to a single point or spread to about 252 degrees around the listener, along with being rotated any which way.

ThreeDeeJay commented 9 months ago

Of anything specific? Do you want it as a 7.1, 5.1, or binaural mix?

Binaural would be appreciated, with this SADIE MHR if it's not too inconvenient.

Super Stereo essentially changes a stereo sound into B-Format, which means it's not tied to a particular number of discrete channels and can be mixed for anything that has an appropriate decoder. The width can be narrowed to a single point or spread to about 252 degrees around the listener, along with being rotated any which way.

I see. I'm just not sure it being non-discrete would necessarily be an improvement, since I'd prefer pinpoint accuracy over soundfield envelopment. But then again, I'm just speculating so I'd love to hear how it sounds in practice. 🤔

Stereo.zip would be great for this test since I've already uploaded clips of other upmixers for comparison here.

kcat commented 9 months ago

SuperStereo.zip mixed using the aforementioned SADIE MHR and ambi2 ambisonic HRTF mode, with the Wave File Writer output. The default width of 0.593 was used, so it wraps around by about 213 degrees.

ThreeDeeJay commented 9 months ago

That's pretty good. Though at least to me, the front image sounds more diffuse compared to stereo/upmix virtualization (kinda expected since it's using ambisonics) and the left and right speakers sound well under 180 degrees apart and I think it's because the HRTF isn't using the enhancements I applied to improve HeSuVi upmix.

If I understand correctly, upmixing makes some sounds play from different upmixed channels simultaneously which tends to make positioning diffuse, but it can be minimized by making HRIRs as clean and accurate as possible (synced, front stabilized, mirrored, etc.), otherwise it can sound bassy, muffled and positionally diffuse like this, which uses the exact same HRIR and upmix method as the recording linked above. These enhancements make little to no difference for virtualized stereo, virtual discrete surround and spatial sound so I think that's why it's rarely noticeable.

I know makemhr already has an option to mirror HRTF which I think help but perhaps it'd be better to mirror (and perhaps front-stabilize) only the upmixed audio on-the-fly. I think minimum-phase reconstruction also helps since I think it does something similar to trim, which means the only enhancement not implemented is HRIR syncing. I'm yet to find a way to do this programmatically and accurately, so I just manually sync the peaks of all (center, front, side and rear) virtual speaker pairs manually in Audacity like this. So if you ever figure out how to implement this into makemhr, an option to export the processed HRIRs back into WAV for HeSuVi would be a godsend so I can process the entire HRTF database, which alongside diffuse-field equalization, would make binaural samples a lot more consistent so we can just focus on the positional differences.

On a side note, I managed to "bake" stereo upmix into the HeSuVi HRIR converted into Viper4Android convolver format so I can apply it to all audio (except binaural, of course), and it works particularly well with lossless and hard-panned mixes like chiptunes, where channel separation works best: https://youtu.be/NlX3na3O4IE

ThreeDeeJay commented 8 months ago

On a side note, I finally found a site that's kinda like imgsli, but for audio (plus cycle with numpad) to make changes in audio more evident: https://share.unmix.app/JWwcfH5HDhZcEOElkeQs/embed The main server might not be up forever though, unless we self-host or storage can be offloaded or something