Closed FigBug closed 6 months ago
There is a bug in Clang on macOS caused by -ffast-math
(implied by -Ofast
) flag.
To workaround this, compile KFR DFT files with -ffp-contract=fast
.
Other files that use KFR may be compiled with any flags.
Did you get a chance to test this workaround for Clang bug?
We gave your suggestion a try but that did not work out. Just adding -ffp-contract=fast
did not change anything. Disabling optimizations and then adding -ffp-contract=fast
worked (as did non-optimized builds before) but performance was obviously really bad then. Is there anything else we need to do besides adding the compiler flag we might have missed, let me know. Otherwise we will probably stay on an older version for the time being.
-Ofast
is basically -O3
coupled with -ffast-math
Replace -Ofast
with -O3 -ffp-contract=fast
for those files that include KFR DFT.
The performance will be the same as for the fully optimised builds but the bug will go away.
As I mentioned before, -Ofast
flag causes an internal bug in Clang optimiser.
Unfortunately it neither works with -O3 nor -O2 nor -O1. Only -O0 seems to work without problems.
Could you provide the output of xcodebuild
command? This will print compiler options used for each file.
And please update the kfr_issue
repo if you have made changes to compiler options.
Xcode project in kfr_issue
has explicit FAST_MATH
setting turned on as well as -Ofast
flag. Any of these may cause clang bug.
Does the error mentioned still occur for you?
No idea when it changed, but with the latest version of Xcode and KFRlib this works correctly now on Arm and Intel. So this issue can be closed.
kfr_dft_real_execute_inverse_f32
creates invalid results on macOS intel processors if -Ofast is used.Unfortunately due to a number of broken commits that do not compile, I can not determine exactly when this stopped working. According to git bisect, the first bad commit could between Fri Nov 4 17:47:07 2022 +0000 and Fri Nov 11 19:43:43 2022 +0000.
It only seems to affect certain block sizes.
I will provide a minimal program to reproduce: https://github.com/reFX/kfr_issue
If you compile with Xcode 14.3.1 and run on intel processor, with the -Ofast flag, the output is incorrect. With other optimization levels, output is correct. Output is always correct on ARM processors.
My test program does a series inverse FFts, correct output looks like this:
With -Ofast it looks like this: