kfrlib / kfr

Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
https://www.kfrlib.com
GNU General Public License v2.0
1.62k stars 248 forks source link

kfr_dft_real_execute_inverse_f32 creates invalid results on macOS on Intel processors with -Ofast #183

Closed FigBug closed 6 months ago

FigBug commented 1 year ago

kfr_dft_real_execute_inverse_f32 creates invalid results on macOS intel processors if -Ofast is used.

Unfortunately due to a number of broken commits that do not compile, I can not determine exactly when this stopped working. According to git bisect, the first bad commit could between Fri Nov 4 17:47:07 2022 +0000 and Fri Nov 11 19:43:43 2022 +0000.

It only seems to affect certain block sizes.

I will provide a minimal program to reproduce: https://github.com/reFX/kfr_issue

If you compile with Xcode 14.3.1 and run on intel processor, with the -Ofast flag, the output is incorrect. With other optimization levels, output is correct. Output is always correct on ARM processors.

My test program does a series inverse FFts, correct output looks like this:

image

With -Ofast it looks like this:

image
dancazarin commented 1 year ago

There is a bug in Clang on macOS caused by -ffast-math (implied by -Ofast) flag. To workaround this, compile KFR DFT files with -ffp-contract=fast. Other files that use KFR may be compiled with any flags.

dancazarin commented 1 year ago

Did you get a chance to test this workaround for Clang bug?

MarcKamradt commented 1 year ago

We gave your suggestion a try but that did not work out. Just adding -ffp-contract=fast did not change anything. Disabling optimizations and then adding -ffp-contract=fast worked (as did non-optimized builds before) but performance was obviously really bad then. Is there anything else we need to do besides adding the compiler flag we might have missed, let me know. Otherwise we will probably stay on an older version for the time being.

dancazarin commented 1 year ago

-Ofast is basically -O3 coupled with -ffast-math Replace -Ofast with -O3 -ffp-contract=fast for those files that include KFR DFT. The performance will be the same as for the fully optimised builds but the bug will go away.

As I mentioned before, -Ofast flag causes an internal bug in Clang optimiser.

MarcKamradt commented 1 year ago

Unfortunately it neither works with -O3 nor -O2 nor -O1. Only -O0 seems to work without problems.

dancazarin commented 12 months ago

Could you provide the output of xcodebuild command? This will print compiler options used for each file. And please update the kfr_issue repo if you have made changes to compiler options. Xcode project in kfr_issue has explicit FAST_MATH setting turned on as well as -Ofast flag. Any of these may cause clang bug.

dancazarin commented 9 months ago

Does the error mentioned still occur for you?

reFX-Mike commented 7 months ago

No idea when it changed, but with the latest version of Xcode and KFRlib this works correctly now on Arm and Intel. So this issue can be closed.