Make all processes use identical FFTW plans

When using flags=FFTW.MEASURE, FFTW does some run-time tests to choose the fastest version of its algorithm to use. This can lead to different processes using slightly different algorithms - these should only differ by machine-precision-level rounding errors but (surprisingly!) this may be important. This PR updates so that the FFTW plans are generated on the root process, then written out to an 'FFTW wisdom' file which is read by all other processes, ensuring that all processes use exactly the same algorithm.

While working on the kinetic electrons, I came across a very strange bug. A simulation that would run correctly on a small number of processes (e.g. 8, with the z-direction split into 8 blocks using distributed-memory, so each shared-memory block only has one process), but would fail on a large number of processes (64 or 128, so each shared memory block has 8 or 16 processes). While trying to debug, I found out that FFTW can give slightly different results between runs when using FFTW_MEASURE. To aid comparing results between runs, I switched to FFTW.ESTIMATE (which is the same every time) and the bug went away! I don't understand why this fixed the bug. The different FFTW algorithms should only be different by machine-precision level rounding errors, so I don't understand how this can cause a numerical instability. My only guesses are that either the slight inconsistency does somehow make a difference, or that when doing the run-time testing somehow multiple instances of FFTW conflict with each other and subtly corrupt something, eventually leading (after several thousand pseudo-timesteps!) to a failure to converge. Anyway, making this 'fix' so that the run-time testing is done only on one process, then passed to all the others so that they all use the exact same algorithm, that bug has gone away, so I think this fix is useful.

I'm not 100% sure the problem is entirely fixed yet, because even with this fix the simulation I'm testing fails to converge, although at a much later time...

mabarnes / moment_kinetics

Make all processes use identical FFTW plans #203