kcat / openal-soft

OpenAL Soft is a software implementation of the OpenAL 3D audio API.
Other
2.22k stars 536 forks source link

Muffled sound with Cubic resampler ever since using Gaussian filter. #985

Closed geneotech closed 7 months ago

geneotech commented 7 months ago

Hello! My game currently ships with 44.1 kHz sound effects and I'm testing on 48.0 kHz headphones - the quality was perfect until I recently upgraded to the latest OpenAL-soft version.

I narrowed it down to the commit that replaces Cubic Spline with Gaussian Filter for the Cubic resampler. For reference:

Last good commit ba12551b69eeea411ea46593fc1f0d2c87b9d65c :

https://github.com/kcat/openal-soft/assets/3588717/82241914-59ab-44b9-b6f3-85013afe5259

Breaking commit e6c2df10e91d6bdb888e32520a4f32587e8c7c13 :

https://github.com/kcat/openal-soft/assets/3588717/ad14ea07-c5fa-48b8-b28a-974f8deb2a99

It's subtle, but the second clip sounds a little muffled (as if heard through wall). As it stands even Linear/Point are way closer than Cubic to the original. These clips were recorded with identical conditions, sound/device/volume settings etc. The only thing that differed is the commit, and it's reproducible every time. Also - for some reason this mostly happens on Windows (testing on Windows 11). I had this issue for a while on my Arch Linux but it disappeared after upgrading the system today (it did update pipewire and libpipewire but I'm still surprised since PulseAudio is the chosen backend - the resampler is still being run as I checked it by inserting junk values into GaussFilterArray::GetCoeff). Maybe the gaussian filter is applied over all samples by accident? I'm also using clang 16.0.4 to build on Windows and 17.0.6 on Linux.

Original muzzle shot sound for reference: https://github.com/TeamHypersomnia/Hypersomnia/raw/master/hypersomnia/content/sfx/szturm_muzzle_1.ogg

kcat commented 7 months ago

Not sure why there would be a difference between Windows and Linux, unless the system configuration is different that's causing extra processing somewhere. It's probably worth noting that the WASAPI backend can fairly well detect headphones which will cause OpenAL Soft to enable HRTF as a default. The PulseAudio backend should as well, but I haven't seen the PipeWire backend be able to. If there's a difference with HRTF being enabled, that can make it more difficult to notice subtle high frequency attenuation (though HRTF itself would generally be a detectable difference too).

As for the muffling issue, that's likely the result of how the filter works. When resampling audio, you essentially need a low-pass filter to cut frequencies above the nyquist frequency (the maximum frequency sound that can be represented by a given sample rate, which is half the sample rate). This includes upsampling (low -> high sample rate) and downsampling (high -> low sample rate). Whichever nyquist frequency is lower between the input and output, you need to remove all sound above that frequency from the input when resampling. Otherwise you get harmonic distortion in the output, extra high-frequency noise beyond the nyquist frequency that didn't exist in the original sound as well as extra harmonics throughout the audible range from that extra high-frequency noise folding back over the available frequencies, as a result of aliasing artifacts.

The cubic filter is pretty limited since it only has 4 sample points per output sample, and the spline and gaussian filters have different filtering characteristics as they're designed for different things. For its part, the spline filter isn't really intended for audio. It retains higher frequencies in the signal, letting more of that distortion through. In contrast, the guassian filter creates a relatively subtle low-pass filter, which is more useful with audio as it reduces the high-frequency noise and related distortion, but at the cost of slightly reducing the original higher frequencies of the sound.

The other resamplers, (fast_)bsinc12 and 24, try to tighten up the filter response by using a larger filter. With 12- and 24-point filters, they more aggressively remove the high frequency noise and distortion, and avoid reducing as much of the original high frequencies. This of course comes at the cost of more CPU use, since it's doing more processing per sample.

To visualize, this is a normal whitenoise response at 48khz: Screenshot_20240405_013001

This is the gaussian cubic filter playing 44.1khz whitenoise on 48khz output: Screenshot_20240405_012754 Everything above 22khz is noise that shouldn't exist since a 44.1khz sound can't represent it. It manages to drop almost -20dB before reaching its nyquist frequency, beyond which is pure aliasing noise. You can't see it because of the whitenoise masking it, but there's also extra harmonics below 22khz too, as a result of the aliasing noise reaching up to 24khz and then folding back over the available frequencies. The -20dB drop by the nyquist frequency helps soften the harmonic distortion, making them less noticeable. It's obviously not perfect, you can still hear them with normal sounds if you try, but it's cleaner than the amount of noise caused by the spline filter: Screenshot_20240405_024411 which barely manages only a -5dB drop before the nyquist frequency. The aliasing noise will fold back as much more audible harmonic distortion.

For completeness, this is the bsinc24 filter: Screenshot_20240405_012900 You can see more of the sound stays unattenuated compared to the gaussian (dropoff starting at 16-17khz vs 8-9khz), with a deeper drop of -60dB where the aliasing noise starts, making the distortion significantly less audible as well.

All that said, I could theoretically add back the spline filter as an option alongside the gaussian. Although from a performance and quality perspective, it may be better to use point or linear if you're looking for something cheap that doesn't attenuate the higher frequencies as much and don't mind the harmonic distortion, or (fast_)bsinc24 if you're willing to spend the extra CPU power to keep more of the higher frequencies and also reduce the harmonic distortion.

geneotech commented 7 months ago

Hey, thank you for the prompt and detailed response! And thank you for creating such an amazing library, I've been happily shipping with it for nearly a decade now!

I confirmed HRTF is not a factor here as:

To debug, I tried to output to wave backend on both OSes. I'm now getting consistent results - the exact same discrepancy can be heard on Linux. Here are the completely unedited wave outputs for reference:

Commit e6c2df10e91d6bdb888e32520a4f32587e8c7c13 (bad): https://sndup.net/q3f2/

Commit ba12551b69eeea411ea46593fc1f0d2c87b9d65c (good): https://sndup.net/qf3d/

(The first sound here is a backpack wear sound, on which this difference is also apparent)

This is some amazing explanation of how it all works! I'm not well versed in sound physics, however from your description I now understand the Gaussian filter is supposed to be cutting the high frequencies. Still, it seems that the cubic spline at the very least is way closer to the original. Is Gaussian supposed to be muffling so much? Here's a test of these same two sound effects put next to each other in Audacity and exported to 48.0 kHz: http://sndup.net/v2wz/ For me it sounds exactly like the good commit (spline), and exactly like the sound effects if played back directly from the audio files. I would expect the library default to keep the noticeable frequencies intact, however I understand if you prioritize less harmonic distortion over avoiding muffling. Since cubic spline and gaussian are so different, it definitely makes sense to have them as separate resamplers, esp. if the spline has been battle-tested now. E.g. have Spline as a new enum below BSinc24. I would totally continue to use Spline :)

Also - I just found the reason this didn't happen for me on Linux (i.e. I had crisp playback on both the good and bad commit) was pulse. Pulse makes it right for some reason. Outputting to alsa again reproduces the bad behavior. I was wondering if it's because pulse has a fix-rate option, but with pulse the playback is for some reason crisp with both true and false values, on both good and bad commits.

kcat commented 7 months ago

Still, it seems that the cubic spline at the very least is way closer to the original.

The spline filter keeps more of the high frequencies unattenuated, but that comes at the cost of more harmonic distortion. When it comes to a lot of DSP operations, you're very often faced with trade-offs like these. Less attenuation of the original sound will lead to more distortion with the sound, while suppressing the distortion will affect more of the original sound. Improving the quality overall will require more CPU use, while reducing CPU use will make the distortion trade-off more apparent.

Is Gaussian supposed to be muffling so much?

It's about what I'd expect, given the size of the filter being only 4 samples. The amount of high frequency attenuation is rather subtle, relatively speaking, having a non-negligible effect on the distortion and not being too aggressive on the original sound. Possibly there's room to improve the coefficients for a higher cutoff frequency, but I wouldn't expect too much wiggleroom without sacrificing quality.

The available resamplers are intended to give users the option to balance performance vs quality given how much CPU power is available and how important audio quality is. The current gaussian filter I feel fits well between linear and bsinc12 in both metrics. It does better at suppressing distortion than linear but not as much as bsinc12, while it is more efficient than bsinc12 but not as fast as linear. In comparison, the spline filter seems closer to linear in terms of quality, not having as much effect on the high frequencies or distortion, but is the same performance as gaussian despite that.

That's why I suggested the linear resampler if you wanted to keep more of the higher frequencies without being too concerned about harmonic distortion, as it seems to give a similar filter response while being more efficient to process than spline. So it really comes down to how different linear and spline actually are, and if the latter is needed given the former. If there's enough quality difference between linear and spline, it could be worth having spline and guassian as separate options, but I'd need more convincing since my measurements only show a very minor difference in the highest frequencies.

geneotech commented 7 months ago

So - I followed your advice about checking out Linear. Indeed - it's a lot better than Gaussian for my aesthetic.

Edited out the rest of the comment - sorry, I just realized the test I performed in this post was using alsa only on the Linear and pulse on other resamplers, but pulse is likely doing some postprocessing - it turns out Linear is almost identical to Cubic in my experiments when both outputted to alsa.

Testing on Windows, I find that the original Cubic is still marginally, although really marginally, better at preserving gunshot snappiness/brightness - for what it's worth, it's also nearly identical to how it sounds on the Web (where it uses Web Audio API instead of OpenAL) - however if there won't be a separate Spline option, Linear should work fine for me from now on.

Thanks for your patience!

kcat commented 7 months ago

It's possible pulse could be setting 44.1khz playback for some reason, causing OpenAL Soft to skip its resampler when playing a 44.1khz sound. PulseAudio (and PipeWire, WASAPI, etc) can then use its own higher quality resamplers to do 44.1->48khz since they don't have to worry about resampling possibly hundreds of sounds at the same time (or necessarily be concerned with on-the-fly rate changes). Audacity can use even higher quality resamplers since it doesn't need to work in real-time. But like I said, there's potentially room for at least some improvement with OpenAL Soft's resamplers.

In either case, I added back the cubic spline resampler with commit fdd16434c663b68fbddd6fe2c97a9e0c66b1f15e. The previous (recent) cubic resampler was renamed to gaussian (cubic will be treated as guassian and is still the default), and spline is a new option alongside it.

geneotech commented 7 months ago

Nice! 🎉🎉 God's work sir, just tested it, spline sounds again just like I loved it before. Thank you so much for your consideration!