mreineck / pocketfft

Fork of https://gitlab.mpcdf.mpg.de/mtr/pocketfft to simplify external contributions
BSD 3-Clause "New" or "Revised" License
71 stars 34 forks source link

PocketFFT for C++

This is a heavily modified implementation of FFTPack [1,2], with the following advantages:

License

3-clause BSD (see LICENSE.md)

Some code details

Twiddle factor computation:

Efficient codelets are available for the factors:

Larger prime factors are handled by somewhat less efficient, generic routines.

For lengths with very large prime factors, Bluestein's algorithm is used, and instead of an FFT of length n, a convolution of length n2 >= 2*n-1 is performed, where n2 is chosen to be highly composite.

[1] Swarztrauber, P. 1982, Vectorizing the Fast Fourier Transforms (New York: Academic Press), 51

[2] https://www.netlib.org/fftpack/

[3] https://en.wikipedia.org/wiki/Chirp_Z-transform

Configuration options

Since this is a header-only library, it can only be configured via preprocessor macros.

POCKETFFT_CACHE_SIZE:\ if 0, disable all caching of FFT plans, else use an LRU cache with the requested size. If undefined, assume a cache size of 0.\ NOTE: caching is disabled by default because its benefits are only really noticeable for short 1D transforms. When using caching with transforms that have very large axis lengths, it may use up a lot of memory, so only switch this on if you know you really need it! Default: undefined

POCKETFFT_NO_VECTORS:\ if defined, disable all support for CPU vector instructions.\ Default: undefined

POCKETFFT_NO_MULTITHREADING:\ if defined, multi-threading will be disabled.\ Default: undefined

Programming interface

All symbols are encapsulated in the namespace pocketfft.

Arguments

General constraints on arguments

Detailed public interface

using shape_t = std::vector<std::size_t>;
using stride_t = std::vector<std::ptrdiff_t>;

constexpr bool FORWARD  = true,
               BACKWARD = false;

template<typename T> void c2c(const shape_t &shape, const stride_t &stride_in,
  const stride_t &stride_out, const shape_t &axes, bool forward,
  const complex<T> *data_in, complex<T> *data_out, T fct,
  size_t nthreads=1)

template<typename T> void r2c(const shape_t &shape_in,
  const stride_t &stride_in, const stride_t &stride_out, size_t axis,
  bool forward, const T *data_in, complex<T> *data_out, T fct,
  size_t nthreads=1)

/* This function first carries out an r2c transform along the last axis in axes,
   storing the result in data_out. Then, an in-place c2c transform
   is carried out in data_out along all other axes. */
template<typename T> void r2c(const shape_t &shape_in,
  const stride_t &stride_in, const stride_t &stride_out, const shape_t &axes,
  bool forward, const T *data_in, complex<T> *data_out, T fct,
  size_t nthreads=1)

template<typename T> void c2r(const shape_t &shape_out,
  const stride_t &stride_in, const stride_t &stride_out, size_t axis,
  bool forward, const complex<T> *data_in, T *data_out, T fct,
  size_t nthreads=1)

/* This function first carries out a c2c transform along all axes except the
   last one, storing the result into a temporary array. Then, a c2r transform
   is carried out along the last axis, storing the result in data_out. */
template<typename T> void c2r(const shape_t &shape_out,
  const stride_t &stride_in, const stride_t &stride_out, const shape_t &axes,
  bool forward, const complex<T> *data_in, T *data_out, T fct,
  size_t nthreads=1)

/* This function carries out a FFTPACK-style real-to-halfcomplex or
   halfcomplex-to-real transform (depending on the parameter `real2hermitian`)
   on all specified axes in the given order.
   NOTE: interpreting the result of this function can be complicated when
   transforming more than one axis! */
template<typename T> void r2r_fftpack(const shape_t &shape,
  const stride_t &stride_in, const stride_t &stride_out, const shape_t &axes,
  bool real2hermitian, bool forward, const T *data_in, T *data_out, T fct,
  size_t nthreads=1)

/* For every requested axis, this function carries out a forward Fourier
   transform, and the real and imaginary parts of the result are added before
   the next axis is processed.
   This is analogous to FFTW's implementation of the Hartley transform. */
template<typename T> void r2r_separable_hartley(const shape_t &shape,
  const stride_t &stride_in, const stride_t &stride_out, const shape_t &axes,
  const T *data_in, T *data_out, T fct, size_t nthreads=1);

/* This function carries out a full Fourier transform over the requested axes,
   and the sum of real and imaginary parts of the result is stored in the output
   array. For a single transformed axis, this is identical to
   `r2r_separable_hartley`, but when transforming multiple axes, the results
   are different.

   NOTE: This function allocates temporary working space with a size
   comparable to the input array. */
template<typename T> void r2r_genuine_hartley(const shape_t &shape,
  const stride_t &stride_in, const stride_t &stride_out, const shape_t &axes,
  const T *data_in, T *data_out, T fct, size_t nthreads=1);

/* if ortho==true, the transform is made orthogonal by these additional steps
   in every 1D sub-transform:
   Type 1 : multiply first and last input value by sqrt(2)
            divide first and last output value by sqrt(2)
   Type 2 : divide first output value by sqrt(2)
   Type 3 : multiply first input value by sqrt(2)
   Type 4 : nothing */
template<typename T> void dct(const shape_t &shape,
  const stride_t &stride_in, const stride_t &stride_out, const shape_t &axes,
  int type, const T *data_in, T *data_out, T fct, bool ortho,
  size_t nthreads=1);

/* if ortho==true, the transform is made orthogonal by these additional steps
   in every 1D sub-transform:
   Type 1 : nothing
   Type 2 : divide last output value by sqrt(2)
   Type 3 : multiply last input value by sqrt(2)
   Type 4 : nothing */
template<typename T> void dst(const shape_t &shape,
  const stride_t &stride_in, const stride_t &stride_out, const shape_t &axes,
  int type, const T *data_in, T *data_out, T fct, bool ortho,
  size_t nthreads=1);