mabarnes / moment_kinetics

Other
2 stars 4 forks source link

Make all processes use identical FFTW plans #203

Closed johnomotani closed 2 months ago

johnomotani commented 2 months ago

When using flags=FFTW.MEASURE, FFTW does some run-time tests to choose the fastest version of its algorithm to use. This can lead to different processes using slightly different algorithms - these should only differ by machine-precision-level rounding errors but (surprisingly!) this may be important. This PR updates so that the FFTW plans are generated on the root process, then written out to an 'FFTW wisdom' file which is read by all other processes, ensuring that all processes use exactly the same algorithm.

While working on the kinetic electrons, I came across a very strange bug. A simulation that would run correctly on a small number of processes (e.g. 8, with the z-direction split into 8 blocks using distributed-memory, so each shared-memory block only has one process), but would fail on a large number of processes (64 or 128, so each shared memory block has 8 or 16 processes). While trying to debug, I found out that FFTW can give slightly different results between runs when using FFTW_MEASURE. To aid comparing results between runs, I switched to FFTW.ESTIMATE (which is the same every time) and the bug went away! I don't understand why this fixed the bug. The different FFTW algorithms should only be different by machine-precision level rounding errors, so I don't understand how this can cause a numerical instability. My only guesses are that either the slight inconsistency does somehow make a difference, or that when doing the run-time testing somehow multiple instances of FFTW conflict with each other and subtly corrupt something, eventually leading (after several thousand pseudo-timesteps!) to a failure to converge. Anyway, making this 'fix' so that the run-time testing is done only on one process, then passed to all the others so that they all use the exact same algorithm, that bug has gone away, so I think this fix is useful.

I'm not 100% sure the problem is entirely fixed yet, because even with this fix the simulation I'm testing fails to converge, although at a much later time...

mrhardman commented 2 months ago

A much simpler solution could be to use the matrix formulation for the Chebyshev derivatives, which wouldn't require an FFT at all. This exists for the moment only for the Gauss Lobatto elements (the Radau elements use the FFT to compute the matrix), but this could be fixed. You could try using the Gauss-Legendre grid for your physics problem to test if the FFT is in any way connected to the loss of convergence of the simulation. If 1D interpolation is required for the Gauss-Legendre grid to be usable, I can provide that routine after a short discussion to determine where it is needed.