Closed FullyArticulate closed 11 months ago
You're running non-optimized debug build. Optimized builds have "optimized" flag after "KFR 5.1.0" Like this:
KFR 5.1.0 optimized avx2 64-bit (clang-14.0.0/linux) +in +ve running on avx2
Please check that you're using Release mode (In cmake it is enabled by -DCMAKE_BUILD_TYPE=Release
flag)
That was it-sorry for the error on my part. In case anyone is interested in the final results:
KFR 5.1.0 optimized avx2 64-bit (clang-14.0.0/linux) +in +ve running on avx2 [----RUN----] test_performance... ... [PERFORMANCE] DFT float 8192... 112325.3 ops/second
So, roughly 133% the speed of FFTW, and 65% the speed of IPP.
It feels like I'm missing something, but I don't understand what. On a modern Ubuntu 22 x86-64 system, I build and run the dft_test. For 8192 points, I get:
KFR 5.1.0 avx2 64-bit (clang-14.0.0/linux) +in +ve running on avx2 [PERFORMANCE] DFT float 16... 1144627.5 ops/second [PERFORMANCE] DFT double 16... 1008756.7 ops/second [PERFORMANCE] DFT float 32... 516523.2 ops/second [PERFORMANCE] DFT double 32... 464478.6 ops/second ... [PERFORMANCE] DFT float 8192... 769.4 ops/second
I've reproduced this datapoint in my own test code using KFR. However, on this same system, I'm getting: FFTW - 84,346 ops/second IPP - 171,704 ops/second
Your benchmark graphs would seem to indicate I should get roughly the speed of IPP at 8192 points, but I'm off by almost 300x. Any suggestions? Is this the expected result? Thanks!