Storing and restoring the multigrid setup

With reference to my code https://github.com/sunpho84/nissa used in a number of ETM collaboration projects.

The current framework for storing & restoring the multigrid setup requires qio, and initializing MPI through qmp interface. I've not been able to do actually test it (I stopped when I realized I had to modify the MPI staging), but I've also been told that this I/O interface takes very long.

It would be very useful to have an efficient way to do this store & restore quickly.

I suggest it would be sufficient to have an interface to provide the user with the raw data copied on host side, and leave the user the task to either dump the data on disk for future recycle, or leave them on memory to allow swapping among different setups in the same run. There are a few case of use that one can immediately envisage:

multiple short job using the same setup across different runs (useful to split a long run across shorter runs: more resilient, typically higher priority in schedulers, more flexible, etc)
multiple boundary conditions used in the same runs, which need different setup, that cannot be grouped due to other constraints
experimenting with the null vectors & eigenvectors

I've experimented quite for some time on that, and I have a half-baked-almost-working version for this https://github.com/sunpho84/nissa/blob/a2d6edc2a0c70ba5b723343fe31f4eda54f2aa4e/src/base/quda_bridge.cpp#L283 which relies on "robbing" private pointer across the data layout (ugh!), and still fails to properly reconstruct the deflated coarser grid solver.

Plus, the non-deflated version does not work as efficiently as if issuing a fully new setup - I must be missing some update step or similar.

One crucial point that I had missed at the beginning, is that loading a new gauge configuration destroys a large deal of the internal setup (operators, solver, etc) so one is forced to recreate most of them (if I'm not missing something), which I believe complicates the application of the preserve flag of the deflated coarse grid solver.

I met a similar issue with my application. I cannot use dumpMultigridQuda with the QUDA_MPI=ON option and don't want to introduce QMP for some reason.

Replacing read_spinor_field and write_spinor_field with a non-QMP implementation might be a straightforward way to enable dumpMultigridQuda with MPI. But it's not easy to decide which file format we should use to save these fields. HDF5 might be a choice, and QUDA should build libhdf5 from the source just like the Eigen package, which makes things a bit more complicated. Another choice is a private format of a binary with a header to handle lattice properties, which is easy to implement with MPI I/O but other applications will have to write extra code to handle the new format (Although I don't think other apps need multigrid setup field). I think making these I/O functions enabled with MPI has benefits for both testing (heatbath_test for example) and application.

@maddyscientist I could make a PR if you believe it's suitable to introduce another file format to QUDA. Do you have any idea about the choice of the format?

lattice / quda

Storing and restoring the multigrid setup #1482