jurihock / stftPitchShift

STFT based real-time pitch and timbre shifting in C++ and Python
MIT License
115 stars 14 forks source link

Add specialized real-valued FFT implementation #7

Closed jurihock closed 2 years ago

jurihock commented 2 years ago

The factorization trick assumes a complex-valued time domain signal representation. To additionally halve the computational effort and to possibly avoid data caches, a further workaround for real-valued time domain signals is therefore needed:

  1. Real-valued fast Fourier transform algorithms + corrections
  2. Implementation of "Split-radix" FFT algorithms for complex, real, and real-symmetric data
  3. Experiences with real-valued FFT algorithms
  4. An Algorithm for the Machine Calculation of Complex Fourier Series
  5. FFT: Schnelle Fourier-Transformation (Abschnitt 10.10)
jurihock commented 2 years ago

Radix-2 [1] and Split-Radix [1,2] looks interesting, but the FORTRAN example in [1] is just unreadable for me. According to [3] the Split-Radix seems to be the fastest option. Sequential output memory layout in each case is currently the drawback, since no direct mapping to std::complex possible.

The implementation idea in [5] is a combination of [4] with additional "lightweight" post-processing and is easy to understand. But it requires a data buffer due to out-of-place computation, which is not a huge problem anyway.

jurihock commented 2 years ago

Implementing Fast Fourier Transform Algorithms of Real-Valued Sequences With the TMS320 DSP Platform provides just another explanation for the (un-)packing the 2N point FFT results. In contrast to [5] the reverse calculation is described too. Equations 12 and 14 seem to work fine. The alternative equations 15 and 16 seem to be buggy...

fakufaku/esp32-fft shows an additional reduction of the (un-)packing loop by N/2...

jurihock commented 2 years ago

Performance comparison

Total C++ program execution time to process the voice.wav file with old pocketfft and new RFFT enabled on M1 Mac:

options pocketfft RFFT
-p 1 0.220s 0.270s
-p 1 -q 1 0.310s 0.400s
jurihock commented 2 years ago

Quality comparison

Output files generated by Python script, which still uses pocketfft, and C++ with RFFT enabled are compared.

With -p 1 or -p 1 -q 1 options (e.g. no spectral processing, just STFT with liftering if enabled) the absolute amplitude difference of both output files is max. 0.055.

The RFFT produces no high frequency noise etc. The spectrum plot in Audacity is about the same.

With options -p 0.5, -p 1, -p 2 or -p 1,1.25,1.5 no audible difference.

The option combinations -p >=1 and -q 1 are fine.

The only problem is finally fixed in 428932a seems to be the option combinations -p <1 and -q 1, which is possible the same kind of problem as mentioned in #4. A further comparison with FFTW #6 is necessary. Also check the double precision #8.

jurihock commented 2 years ago

Since there are no more obvious issues, the pocketfft will be finally replaced with RFFT and removed from sources.