mpip / pfft

Parallel fast Fourier transforms
GNU General Public License v3.0
54 stars 23 forks source link

pfft 3D data decomposition #37

Open octupole opened 4 years ago

octupole commented 4 years ago

Hi, I have been testing the scaling of pfft on our cluster (a few thousands of broadwell nodes with 28 cores each). Although the scaling for my problem (3D grid of 128^3, quite a small grid indeed!) is satisfying, I find that its overall performance with respect to a code such as fftwpp (from Bowen's group), which uses a 2D data decomposition, is poor. Indeed, up until 64 CPU's I find that pfft is consistently 10 times slower that fftwpp. I am not surprised that using a 3D data decomposition with respect to a 2D, the performance would downgrade because of extra communications (as explained in the original paper). But the loss of performance buffles me, and I frankly think I might be doing something wrong somewhere in compiling pfft or in linking it to the system fftw3-mpi. Could you give me some clue on this?

Thank you in advance Max.