paboyle / Grid

Data parallel C++ mathematical object library
GNU General Public License v2.0
153 stars 106 forks source link

Stout smearing routine is incorrect when OrthogDir is set #350

Open RJHudspith opened 3 years ago

RJHudspith commented 3 years ago

Hi there,

I was cross-checking the output of Grid, a QDP++ program (practically a copy of the chroma implentation), and GLU when applying stout link smearing and I noticed that the Grid implementation did not agree with GLU or my QDP++ code for spatial-only smearing (all-directional works fine), i.e. when orthogdim is set to be 3 in the constructor.

The reason for this can be found on lines 92 to 102 of https://github.com/paboyle/Grid/blob/develop/Grid/qcd/smearing/StoutSmearing.h. It is clear that the temporary Umu is only set in the else branch but its value is used for both branches, if orthogdim were to be set to zero this temporary wouldn't even be set and would lead to (I guess) undefined behaviour. This code at the moment is effectively copying the z-pointing links into the time-pointing ones, I doubt this is intentional.

The fix is simple, replace lines 88 to 103 with

// C contains the staples multiplied by some rho
u_smr = U ; // set the smeared field to the current gauge field
SmearBase->smear(C, U);

for (int mu = 0; mu < Nd; mu++) {
  if( mu == OrthogDim ) continue ;
  // u_smr = exp(iQ_mu)*U_mu apart from Orthogdim
  Umu = peekLorentz(U, mu);
  tmp = peekLorentz(C, mu);
  iq_mu = Ta( tmp * adj(Umu));  
  exponentiate_iQ(tmp, iq_mu);
  pokeLorentz(u_smr, tmp * Umu, mu);
}

The code sets the smeared links to the previous U and only the non OrthogDim directions are smeared.

I tested the output of this fix for the spatial and temporal plaquettes and links against those in GLU under spatial smearing on a non-trivial gauge configuration and the agreement is now excellent.

Cheers,

Jamie

felixerben commented 3 years ago

Thanks for the issue - I had written the code for that and I agree it is a bug. I submitted a bugfix pull request, slightly different code to yours but should produce identical results. Thanks!

paboyle commented 3 years ago

Thanks Jamie, I merged Felix's patch to develop. I'd really appreciate a statement from you both on testing of this. Thanks for the help.

paboyle commented 3 years ago

Perhaps you could insert a check at the end of the routine that if OrthogDim >=0 and <= Nd then the gauge link in this polarisation is unmodified?

I think we need to check that the if OrthogDim <0 then the routine is unmodified. Perhaps this is easiest to run one of the smeared link HMC tests, but I probably need to get the expected plaquette for that.

Might make sense for us to update the HMC tests to each print what the expected plaquette is. I can take on this second step since I'm working in that area now.

felixerben commented 3 years ago

I did a quick check using Hadrons - the module MGauge::StoutSmearing prints the plaquette. For the case where orthogDim is not specified (i.e. where it defaults to orthogDim=-1) the plaquette is identical when running the old and the new code.

RJHudspith commented 3 years ago

Hi Peter and Felix,

I can confirm that this patch does fix the issue.

For spatial-only smearing on the same gauge configuration (quenched SU(3), \beta=6.0, V=8^3x16) I have the outputs from GLU (using 4 threads, SSE2 vectorisation, intel i5 processor) [SMEAR] {Iteration} 11 {Link trace} -0.001299141370769 [SMEAR] {Plaquette} 0.826557545052572 {Spatial} 0.993565748363840 {Temporal} 0.659549341741304 [TIMER] elapsed :: 5.668902e-02 s

And the same information from Grid (again using 4 threads and AVXFMA vectorisation) after 10 iterations of Stout smearing (the output is from a code that calls Grid as a library) Full Plaquette 8.265575450525731e-01 Spatial Plaquette 9.935657483638399e-01 Temporal Plaquette 6.595493417413062e-01 Average Link -1.299141370768864e-03 ** Link smearing complete took 5.44513e-01 (s)

By the way, for the spatial-only smearing the version I proposed in the original message is about 10% faster (** Link smearing complete took 4.97722e-01 (s)) than the current bug-fix version. This I assume is because in my version I wasn't multiplying the temporal links by 1 every iteration and doing fewer peek/pokes.

I don't really understand the rationale for the preference of the current (fixed) code as it is harder to read (at least for me) and provably slower than the one I proposed, especially considering this unusual branching and business of setting tmp to 1 was probably the whole reason why this code was wrong in the first place.

Anyway, I would consider Grid to now be good agreement with GLU and this issue resolved. I also find a similar level of agreement for the all-directional smearing.

Jamie

felixerben commented 3 years ago

Hi Jamie, thank you very much for checking the code, and for finding the bug in the first place.

There is no big reason why the fix is different to yours - I only noticed you had suggested a fix after submitting my pull request. I am completely impartial as to which version should be used in Grid - I don't think the Stout smearing will ever be performance critical, but if you have tested yours to be faster, we could use that instead.

paboyle commented 3 years ago

Can either of you submit a pull request and I will approve it.

felixerben commented 3 years ago

I'm preparing this now. Just compiling on my laptop to make sure I didn't introduce some silly bug by wrongly copy&pasting.

felixerben commented 3 years ago

https://github.com/paboyle/Grid/pull/356

done.

paboyle commented 3 years ago

merged