Closed rainwoodman closed 9 years ago
FFTW-MPI interface does not support strides either. You you mean that the strides should be the same on each MPI process? This would not work with unequal data distributions. So strides must be defined for each process separately. This seams to be really complicated from the users point of view. However, we main difficulty is that the global communication of PFFT is based on the FFTW parallel transpose algorithms that do not support strides in the input arrays. This can be circumvented by using a temporary copy array with contiguous memory layout, but merely doubles the amount of memory (or triples, if the following parallel transpose works out-of-place). I tried to implement all the transpositions in a way that they can be performed in-place if memory is restricted.
In summary, the idea sounds nice but I think implementation will be difficult.
Ah. I misunderstood the FFTW interface then. Let's close this.
The guru FFTW interface allows arbitrarily strided input and output array. PFFT does not.
This is a useful use case in a particle mesh code where the local mesh contains a 'ghost region' that is shared by other processes, but do not participate in the FFT.