We get MPI errors if we run in parallel with the latest version of all the dependencies (as of 19/11/2023).
I think the problem is a slightly complicated mixup, that will probably be fixed in the Julia packages relatively soon:
The HDF5_jll.jl package (which bundles a version of HDF5 with Julia) has started supporting MPI (I think just if MPI.jl is being used, but that is always the case for us). However, it always links to the Julia-installed MPI (not the 'system-provided' one, which is what we usually use). Linking to two different MPI libraries causes errors. This is a bug in the Julia packages (not in moment_kinetics), see https://github.com/JuliaIO/HDF5.jl/issues/1079, https://github.com/JuliaPackaging/Yggdrasil/issues/6893.
For HDF5.jl the bug isn't actually a problem, because we tell HDF5 to use a 'system-provided' HDF5 library, which is (at least should be!) linked with the correct MPI library. This means that HDF5.jl doesn't actually use HDF5_jll.jl, so we wouldn't be affected by the bug, except that...
The NetCDF wrapper NCDatasets.jl doesn't support MPI, and does link to the HDF5 provided by HDF5_jll.jl. Previously this wasn't (or at least didn't seem to be) a problem - apparently linking two versions of libhdf5.so that are used in different places is OK - but now that HDF5_jll.jl links to MPI, it means linking two versions of the MPI library, which causes errors.
Possible workarounds:
Wait for the Julia packages to be fixed, then the problem should go away.
Pin HDF5_jll to a slightly older, working version (i.e. version 1.12.x) at least until the Julia packages are fixed.
Get rid of the NetCDF file I/O.
Tell NCDatasets to use a system-provided libnetcdf.so, so it doesn't link to the HDF5_jll.jl version of HDF5. On systems where we have to compile HDF5 for ourselves, this would be annoying as we would have to compile NetCDF as well, and link it to the local version of HDF5.
I think option 2 is the best and easiest solution, while we wait for a fix to https://github.com/JuliaPackaging/Yggdrasil/issues/6893. I think it's possible to tell Julia to pin a package to a certain version, rather than everyone having to do it by hand (and even if we did it 'by hand' the CI jobs would have to do the same thing, which would probably be more work than pinning a package). I'll try to make a PR...
We get MPI errors if we run in parallel with the latest version of all the dependencies (as of 19/11/2023).
I think the problem is a slightly complicated mixup, that will probably be fixed in the Julia packages relatively soon:
Possible workarounds:
I think option 2 is the best and easiest solution, while we wait for a fix to https://github.com/JuliaPackaging/Yggdrasil/issues/6893. I think it's possible to tell Julia to pin a package to a certain version, rather than everyone having to do it by hand (and even if we did it 'by hand' the CI jobs would have to do the same thing, which would probably be more work than pinning a package). I'll try to make a PR...