Slab decomposition with 3/2-rule dealiasing fails for number of processors = number of points in the x-direction

spectralDNS / mpiFFT4py

Parallel FFT in 3D or 2D using MPI for Python. Slab or pencil decomposition possible in 3D. Note this rep is being deprecated in favour of mpi4py-fft (https://bitbucket.org/mpi4py/mpi4py-fft)

GNU Lesser General Public License v3.0

21 stars 10 forks source link

Slab decomposition with 3/2-rule dealiasing fails for number of processors = number of points in the x-direction #1

Open mikaem opened 8 years ago

mikaem commented 8 years ago

Slab decomposition with 3/2-rule dealiasing only works up to N[0]/2 number of processors, where N[0] is the number of points in the x-direction. This is because it is not possible to share N[0]*1.5 points evenly between N[0] processors.

Not sure this is worth fixing because a solution will be suboptimal no matter what.

rainwoodman commented 8 years ago

I am not that familiar with the issue, but MPI FFTW won't fail in this case, would it? I thought FFTW will just leave some processes idling if an optimal domain decomposition is not found.

mikaem commented 8 years ago

Actually, I have no idea how FFTW would solve this problem or if they do? I know that if the first dimension of the original data cannot be evenly shared amongst processors, then FFTW will give more data to the lowest ranks. Whether or not this will work out of the box for zero-padding I do not know.

My first thought on how to solve this problem is to not use 3/2, but instead a factor 2 in zero-padding if the number of processors equals the size of the first dimension. That would work easily, but would of course be more expensive. To try and implement uneven load-balancing seems to me like too much complexity for a suboptimal solution.

rainwoodman commented 8 years ago

Looks like we are talking about zero padded FFT? I am guessing I didn't quite understand the context....

Have considered doing a 1-d pencil decomposition? (c.f. https://github.com/mpip/pfft ) It will surely help load-balancing.

About dealiasing, I am recently running into this. Do you have a reference on this 3/2 dealiasing scheme? I don't think it is explained in the FFTW manual..

mikaem commented 8 years ago

Zero padded FFTs it is:-) What's a 1-d pencil decomposition?

The 3/2 dealiasing scheme is very straightforward, but I have no other reference than the generic descriptions given in e.g., C. Canuto, M. Hussaini, A. Quarteroni and T. Zang. Spectral Methods. Fundamentals in Single Domains. Springer Verlag, 2006 or D Kopriva Implementing Spectral methods for partial differential equations (Recommended)

rainwoodman commented 8 years ago

Thanks for the references.

1-d pencil decomposition is to divide the mesh to Nx x Ny x 1 processes, where Nx x Ny = Np. The slab decomposition in FFTW is dividing to Np x 1 x 1 processes, a special case of 1-d pencil.

It helps load balance when Np >= Nmesh.

The math tricks are worked out in, e.g., https://www-user.tu-chemnitz.de/~potts/workgroup/pippig/paper/PFFT_SIAM_88588.pdf

This is the code paper of pfft -- but pfft is written in C. Will it be much harder to implement than the slabs?

On Wed, Apr 20, 2016 at 11:45 AM, Mikael Mortensen <notifications@github.com

wrote:

Zero padded FFTs it is:-) What's a 1-d pencil decomposition?

The 3/2 dealiasing scheme is very straightforward, but I have no other reference than the generic descriptions given in e.g., C. Canuto, M. Hussaini, A. Quarteroni and T. Zang. Spectral Methods. Fundamentals in Single Domains. Springer Verlag, 2006 or D Kopriva Implementing Spectral methods for partial differential equations (Recommended)

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/spectralDNS/mpiFFT4py/issues/1#issuecomment-212554610

mikaem commented 8 years ago

Isn't that just a regular 2D-pencil decomposition? Perhaps we just have different names for it? The pencil decomposition is already implemented.

rainwoodman commented 8 years ago

Yes. I see.

I am reading through pencil.py -- is the array distributed or shared? It is not very clear by just reading the code.

mikaem commented 8 years ago

It's distributed. First two dimensions in real space are distributed, whereas complex (transformed) data are distributed in x and z directions. There's also a version with final complex data aligned in the x-direction, but it does not contain all features, like zero-padding, yet.

rainwoodman commented 8 years ago

Thanks.

Is there a variable storing the offsets and strides of the local array w.r.t. the global area?

Like the partition attribute and the x,r,k,w attributes in:

https://github.com/rainwoodman/pmesh/blob/master/pmesh/particlemesh.py

mikaem commented 8 years ago

As I understand particlemesh.py, the attributes should all be easily available, even though they're not really at the moment. See get_local_wavenumbermesh in pencil.py for the wavenumbers kx, ky and kz in x, y and z-directions. get_local_mesh gives you the mesh in real space. Here it would be easy to create just three vectors instead of the complete local 3D mesh, but I have not had any use for it yet. Also, the local slices into the global datastructures are available in functions real_local_slice and complex_local_slice. Sorry about the poor documentation.

rainwoodman commented 8 years ago

Yes. I see them now.

Using numpy's broadcast rule on vectors may save memory and computation, e.g. doing gaussian smoothing number of calls to exp drops from N3 to 3N, and memory from 3 \ N3 to 3*N.

I mainly do Particle Mesh simulations, and a convenient, fast function to paint/readout particles from/to a periodic mesh is also going to be handy. Mine are ridiculously slow. Is this something you've thought about?

mikaem commented 8 years ago

Are you doing Particle in Cell type of simulations? I would be interested in that through this project: http://www.mn.uio.no/fysikk/english/research/projects/4dspace/.

I'm using HDF5 to store results and it works very well in parallel for my spectralDNS solver. I know they have some routines for particles in HDF5 as well, but have not tried them.