mreineck / pocketfft

Fork of https://gitlab.mpcdf.mpg.de/mtr/pocketfft to simplify external contributions
BSD 3-Clause "New" or "Revised" License
71 stars 34 forks source link

in-place transformation not working if dimension >1 #4

Closed savowe closed 2 years ago

savowe commented 2 years ago

Hi, in-place trnasformations for d>1 do not work. I guess when done with the first axis, the calculation for the next axis already takes the new data. now, maybe this is expected behaviour. But in that case I would suggest to write this into the readme as I would consider it helpful.

Test-code:

#include <iostream>
#include "pocketfft_hdronly.h"

int main() {
    // Prepare 2x2 matrix as vectors
  std::vector<std::complex<double>> amp_in({1, 1, 1, 1});
  std::vector<std::complex<double>> amp_out({0, 0, 0, 0});
  const std::vector<ptrdiff_t> strides({16, 16});
  const std::vector<size_t> shape({2, 2});
  std::vector<size_t> axes{0, 1};

  std::cout << "Result from 2 dimensional c2c out of place:\n";
  pocketfft::c2c(shape, strides, strides, axes, pocketfft::FORWARD, &amp_in[0],
                 &amp_out[0], double(1.0));
  for (size_t i = 0; i < amp_in.size(); i++) {
    std::cout << amp_out[i] << std::endl;
  }

  std::cout << "Result from 2 dimensional c2c in place:\n";
  pocketfft::c2c(shape, strides, strides, axes, pocketfft::FORWARD, &amp_in[0],
                 &amp_in[0], double(1.0));
 for (size_t i = 0; i < amp_in.size(); i++) {
    std::cout << amp_in[i] << std::endl;
  }
  return 0;
}

result:

Result from 2 dimensional c2c out of place:
(4,0)
(0,0)
(0,0)
(0,0)
Result from 2 dimensional c2c in place:
(3,0)
(0,0)
(2,0)
(1,0)

The first result is correct.

mreineck commented 2 years ago

You should get identical results for both out-of-place and in-place transforms. As far as I can tell, the problem is with the strides that you provide. {16,16} can't be right, since a "normal" 2D array can't have the same stride along both axes. I think you should get the expected results by using a stride array of {32, 16} (i.e. stride is 2 complex values along axis 0 and 1 complex value along axis 1).

Generally if you transform a 2D C-contiguous array, the strides should be {shape[1]*sizeof(complex<double>), sizeof(complex<double>)}.

savowe commented 2 years ago

Indeed, you are right. the problem was my poor understanding of the stride. Thank you very much for the fest reply.

mreineck commented 2 years ago

I admit I'm not absolutely happy with the definition of the strides myself, but this choice was necessary to get the code integrated into scipy. Later versions of the code (https://github.com/mreineck/ducc) no longer count the bytes but rather the data elements, which is more intuitive.

savowe commented 2 years ago

Generally if you transform a 2D C-contiguous array, the strides should be {shape[1]*sizeof(complex<double>), sizeof(complex<double>)}.

And for anyone else finding this later on: in the 3D-column-major case it is {shape[2]*shape[1]*sizeof(complex<double>), shape[1]*sizeof(complex<double>) ,sizeof(complex<double>)}

I admit I'm not absolutely happy with the definition of the strides myself, but this choice was necessary to get the code integrated into scipy. Later versions of the code (https://github.com/mreineck/ducc) no longer count the bytes but rather the data elements, which is more intuitive.

I guess even more so if you are making bindings to python. I am still very early in my project hence I can still easily change between FFT libraries, Will there be more interesting additions to the fft.h header from ducc compared to pocketfft _cpp which might give me reason to change them? In the readme there are non-equidistant ffts mentioned, but they do not seem plemented right now, are they?

Anyway, thank you again for the work on these good libraries.

mreineck commented 2 years ago

Will there be more interesting additions to the fft.h header from ducc compared to pocketfft _cpp which might give me reason to change them?

Depends on the features you need. Mostly I'm tweaking performance and modernize the code internally to make it a bit easier to work on. If your main goal is to have a reliable (and fast, but not record-breaking) implementation that is easy to integrate into your project (since it's a single header) then pocketfft C++ is probably the best candidate. If performance is really important, you should definitely also look at FFTW or (if you have Intel hardware) MKL-FFT.

In the readme there are non-equidistant ffts mentioned, but they do not seem plemented right now, are they?

They do exist, but there are currently only interfaces for very specific subproblems (radio interferometry gridding and 4pi convolution on the sphere). I have not found a really nice way yet to present the fully generic interface, but if this a functionaity you need, I really recommend that you have a look at https://github.com/flatironinstitute/finufft. This package has heavily inspired the non-equidistant FFTs in ducc.