Overrelaxed Coulomb gauge fixing convergence criteria needs to be made more robust

Right now, there are a few issues (in my opinion) with the Coulomb gauge fixing convergence criteria, both in QUDA and analogously in MILC. This manifested itself in testing offloading gauge fixing from MILC into QUDA. Over the course of testing, I noticed that the values of pion/kaon propagators from wall sources agreed perfectly for some timeslices (up to gauge fixing tolerance --- I cranked it to 1e-12 for testing), and imperfectly for others. The timeslices that agreed/disagreed were systematically consistent across different propagators (different flavors of pion/kaon).

This could be traced to a few issues:

The gauge fix "action" which gets minimized is an extensive quantity, so convergence is sensitive to the 3-d volume of the lattice.
Convergence corresponds to the magnitude of the change from iteration to iteration, and not the minimization with respect to the initial gauge fix action.
Convergence to Coulomb gauge is measured as a sum of the action over all timeslices, as opposed to each timeslice independently.

The last point is the most concerning. A good analogy is a batched multi-rhs CG solver: "naively" hitting the tolerance across all the systems doesn't mean you've actually hit the tolerance for each solve independently. The same situation can arise here.

Since the algorithms don't exactly agree between MILC and QUDA, and may have different sensitivities to round-off, we can't necessarily expect agreement between the two due to the issues noted above. It's also hard to say, for a given output, which one is "right" or "more right". A more rigorous convergence condition, or at least tracking convergence independently across each timeslice, is necessary to address this issue.

lattice / quda

Overrelaxed Coulomb gauge fixing convergence criteria needs to be made more robust #1320