kfrlib / kfr

Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
https://www.kfrlib.com
GNU General Public License v2.0
1.66k stars 253 forks source link

Slow DFT (much slower than FFTW or IPP) #197

Closed FullyArticulate closed 11 months ago

FullyArticulate commented 11 months ago

It feels like I'm missing something, but I don't understand what. On a modern Ubuntu 22 x86-64 system, I build and run the dft_test. For 8192 points, I get:

KFR 5.1.0 avx2 64-bit (clang-14.0.0/linux) +in +ve running on avx2 [PERFORMANCE] DFT float 16... 1144627.5 ops/second [PERFORMANCE] DFT double 16... 1008756.7 ops/second [PERFORMANCE] DFT float 32... 516523.2 ops/second [PERFORMANCE] DFT double 32... 464478.6 ops/second ... [PERFORMANCE] DFT float 8192... 769.4 ops/second

I've reproduced this datapoint in my own test code using KFR. However, on this same system, I'm getting: FFTW - 84,346 ops/second IPP - 171,704 ops/second

Your benchmark graphs would seem to indicate I should get roughly the speed of IPP at 8192 points, but I'm off by almost 300x. Any suggestions? Is this the expected result? Thanks!

dancazarin commented 11 months ago

You're running non-optimized debug build. Optimized builds have "optimized" flag after "KFR 5.1.0" Like this:

KFR 5.1.0 optimized avx2 64-bit (clang-14.0.0/linux) +in +ve running on avx2

Please check that you're using Release mode (In cmake it is enabled by -DCMAKE_BUILD_TYPE=Release flag)

FullyArticulate commented 11 months ago

That was it-sorry for the error on my part. In case anyone is interested in the final results:

KFR 5.1.0 optimized avx2 64-bit (clang-14.0.0/linux) +in +ve running on avx2 [----RUN----] test_performance... ... [PERFORMANCE] DFT float 8192... 112325.3 ops/second

So, roughly 133% the speed of FFTW, and 65% the speed of IPP.