Closed ahojukka5 closed 2 months ago
I just came home back from vacation and I will need some time to walk over the details of your code.
However, the 1D case is not a valid use case for distributed FFTs within the heFFTe framework. The 3D and 2D cases can be reduced to batches of 1D FFTs, heFFTe reshapes the data so that batches of 1D transforms can be done entirely within an MPI rank. Using a huge distributed 1D signal would require an entirely different algorithm and an enormous amount of communication. I am not aware of applications or codes that use distributed FFTs, not that I know all there is out there.
The only realistic way to handle a large 1D transform is to move the data on one MPI rank (assuming it even fits in memory) and call the backend.
The proc_setup_min_surface()
came from a specific application for 3D FFTs, it is possible that we missed the 2D case. I will look into it once I settle back into the office.
I acknowledge that the relevance of distributing 1D FFT is questionable, as it's most likely very fast to calculate using just one process. However, it might have some relevance in academic examples and very simplified models, initially implemented in 1D just to confirm that things are working before moving to 2D and 3D.
It is relatively easy to modify proc_setup_min_surface
to work well with all dimensions. I suggest either making it work in this manner or throwing an error if the size of the domain in two dimensions is 1. It should not simply produce an incorrect configuration without any warnings as that's something users do not expect.
I created PR about this issue so that you can more easily examine the code.
I see that we baked the large problem assumption into proc_setup_min_surface()
and we should fix that.
The PR is very clear but while fixing the issue it will not ensure that the produced grid has the correct number of MPI ranks. Check out #52
resolved in #51
It looks like
proc_setup_min_surface
is missing some edge cases.Is this in all cases valid? For example, if one wants to do 1d or 2d FFT, would the last dimension be 1?
Here we iterate up to
num_procs
ornum_procs/i
.Should we somehow consider the simulation domain's size? For example, with the following changes toheffte_example_r2c
:We get grid
1 x 2 x 4
:Whereas doing other way
The program crashes:
I tried to fix this with the following change:
But whereas this modification seems to give a good processor grid (at least for "1d corner case"), splitting the world is still failing.