lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
289 stars 97 forks source link

Storing and restoring the multigrid setup #1482

Open sunpho84 opened 1 month ago

sunpho84 commented 1 month ago

With reference to my code https://github.com/sunpho84/nissa used in a number of ETM collaboration projects.

The current framework for storing & restoring the multigrid setup requires qio, and initializing MPI through qmp interface. I've not been able to do actually test it (I stopped when I realized I had to modify the MPI staging), but I've also been told that this I/O interface takes very long.

It would be very useful to have an efficient way to do this store & restore quickly.

I suggest it would be sufficient to have an interface to provide the user with the raw data copied on host side, and leave the user the task to either dump the data on disk for future recycle, or leave them on memory to allow swapping among different setups in the same run. There are a few case of use that one can immediately envisage:

I've experimented quite for some time on that, and I have a half-baked-almost-working version for this https://github.com/sunpho84/nissa/blob/a2d6edc2a0c70ba5b723343fe31f4eda54f2aa4e/src/base/quda_bridge.cpp#L283 which relies on "robbing" private pointer across the data layout (ugh!), and still fails to properly reconstruct the deflated coarser grid solver.

Plus, the non-deflated version does not work as efficiently as if issuing a fully new setup - I must be missing some update step or similar.

One crucial point that I had missed at the beginning, is that loading a new gauge configuration destroys a large deal of the internal setup (operators, solver, etc) so one is forced to recreate most of them (if I'm not missing something), which I believe complicates the application of the preserve flag of the deflated coarse grid solver.

SaltyChiang commented 1 month ago

I met a similar issue with my application. I cannot use dumpMultigridQuda with the QUDA_MPI=ON option and don't want to introduce QMP for some reason.

Replacing read_spinor_field and write_spinor_field with a non-QMP implementation might be a straightforward way to enable dumpMultigridQuda with MPI. But it's not easy to decide which file format we should use to save these fields. HDF5 might be a choice, and QUDA should build libhdf5 from the source just like the Eigen package, which makes things a bit more complicated. Another choice is a private format of a binary with a header to handle lattice properties, which is easy to implement with MPI I/O but other applications will have to write extra code to handle the new format (Although I don't think other apps need multigrid setup field). I think making these I/O functions enabled with MPI has benefits for both testing (heatbath_test for example) and application.

@maddyscientist I could make a PR if you believe it's suitable to introduce another file format to QUDA. Do you have any idea about the choice of the format?