lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
280 stars 94 forks source link

likely Dslash correctness regression for non-clover twisted mass #792

Closed kostrzewa closed 5 years ago

kostrzewa commented 5 years ago

It seems that there is a correctness regression for the non-clover twisted mass case. Below, I show output from some inversions on a 16c40 quenched Wilson plaquette lattice at beta=5.85. Inversions are done using double-single mixed-CG.

All calls are from within tmLQCD and the QUDA head commit is 50ea7fdb2919e7e628a29744a4a640470a55b53e and we're running on K20m here, but I've observed the same on our P100.

The thing to look for below is the "residue = xxx" line which is the residual check performed using tmLQCD-native operators.

Wilson, Wilson clover and twisted clover work fine

plain Wilson, kappa=0.162

# QUDA: CG: 6234 iterations, <r,r> = 1.064893e-19, |r|/|b| = 3.263270e-10
# QUDA: CG: 6235 iterations, <r,r> = 1.027272e-19, |r|/|b| = 3.205109e-10
# QUDA: CG: 6236 iterations, <r,r> = 9.920721e-20, |r|/|b| = 3.149718e-10
# QUDA: CG: Reliable updates = 4
# QUDA: CG: Convergence at 6236 iterations, L2 relative residual: iterated = 3.149718e-10, true = 3.149718e-10 (requested = 3.162278e-10)
# QUDA: Solution = 2.48394
# QUDA: Reconstructed: CUDA solution = 2.40181, CPU copy = 2.40181
# QUDA: Saving 411 sets of cached parameters to /qbigwork/bartek/quda_resources/kepler_6b51b98212e37b90b55d3b2a6842f0a742eecb4b_gdr0_p2p3/tunecach
e.tsv
# QUDA: Done: 6236 iter / 16.4108 secs = 176.438 Gflops
# QUDA: time spent in reorder_spinor_fromQuda: 0.003460 secs
# Inversion done in 6236 iterations, squared residue = 9.920721e-20!
# Inversion done in 1.68e+01 sec. 

Wilson clover, kappa=0.15, csw=1.0

# QUDA: CG: 6501 iterations, <r,r> = 9.424656e-20, |r|/|b| = 3.152556e-10
# QUDA: CG: Reliable updates = 4
# QUDA: CG: Convergence at 6501 iterations, L2 relative residual: iterated = 3.152556e-10, true = 3.152556e-10 (requested = 3.162278e-10)
# QUDA: Solution = 3.43149
# QUDA: Reconstructed: CUDA solution = 4.38886, CPU copy = 4.38886
# QUDA: Done: 6501 iter / 8.76415 secs = 466.998 Gflops
# QUDA: time spent in reorder_spinor_fromQuda: 0.003848 secs
# Inversion done in 6501 iterations, squared residue = 9.014389e-20!
# Inversion done in 9.68e+00 sec. 

Twisted clover, kappa=0.15, csw=1.0, mu=0.0005

# QUDA: CG: 6623 iterations, <r,r> = 9.389954e-20, |r|/|b| = 3.146747e-10
# QUDA: CG: Reliable updates = 4
# QUDA: CG: Convergence at 6623 iterations, L2 relative residual: iterated = 3.146747e-10, true = 3.146747e-10 (requested = 3.162278e-10)
# QUDA: Solution = 2.44158
# QUDA: Reconstructed: CUDA solution = 2.40921, CPU copy = 2.40921
# QUDA: Done: 6623 iter / 9.59286 secs = 471.338 Gflops
# QUDA: time spent in reorder_spinor_fromQuda: 0.003817 secs
# Inversion done in 6623 iterations, squared residue = 9.038224e-20!
# Inversion done in 1.05e+01 sec. 

Twisted clover, kappa=0.15, csw=1.0, mu=0.005

# QUDA: CG: 3525 iterations, <r,r> = 9.393719e-20, |r|/|b| = 3.147381e-10
# QUDA: CG: Reliable updates = 4
# QUDA: CG: Convergence at 3525 iterations, L2 relative residual: iterated = 3.147381e-10, true = 3.147381e-10 (requested = 3.162278e-10)
# QUDA: Solution = 2.11855
# QUDA: Reconstructed: CUDA solution = 1.76314, CPU copy = 1.76314
# QUDA: Done: 3525 iter / 5.11977 secs = 470.319 Gflops
# QUDA: time spent in reorder_spinor_fromQuda: 0.003532 secs
# Inversion done in 3525 iterations, squared residue = 9.016906e-20!
# Inversion done in 6.05e+00 sec. 

Twisted clover, kappa=0.15, csw=1.0, mu=0.05

# QUDA: CG: 425 iterations, <r,r> = 9.212106e-20, |r|/|b| = 3.117143e-10
# QUDA: CG: Reliable updates = 4
# QUDA: CG: Convergence at 425 iterations, L2 relative residual: iterated = 3.117143e-10, true = 3.117143e-10 (requested = 3.162278e-10)
# QUDA: Solution = 2.04786
# QUDA: Reconstructed: CUDA solution = 1.62231, CPU copy = 1.62231
# QUDA: Done: 425 iter / 0.630835 secs = 464.487 Gflops
# QUDA: time spent in reorder_spinor_fromQuda: 0.003852 secs
# Inversion done in 425 iterations, squared residue = 8.810617e-20!
# Inversion done in 1.54e+00 sec. 

plain twisted mass, however, does not

Twisted mass, kappa=0.163279 (kappa_c), mu=0.0005

# QUDA: CG: 6474 iterations, <r,r> = 9.797976e-20, |r|/|b| = 3.130172e-10
# QUDA: CG: Reliable updates = 4
# QUDA: CG: Convergence at 6474 iterations, L2 relative residual: iterated = 3.130172e-10, true = 3.130172e-10 (requested = 3.162278e-10)
# QUDA: Solution = 2.55777
# QUDA: Reconstructed: CUDA solution = 2.55144, CPU copy = 2.55144
# QUDA: Done: 6474 iter / 6.97965 secs = 448.918 Gflops
# QUDA: time spent in reorder_spinor_fromQuda: 0.003826 secs
# Inversion done in 6474 iterations, squared residue = 1.231274e-15!
# Inversion done in 7.25e+00 sec. 

Twisted mass, kappa=0.163279 (kappa_c), mu=0.005

# QUDA: CG: 3494 iterations, <r,r> = 9.950805e-20, |r|/|b| = 3.154490e-10
# QUDA: CG: Reliable updates = 4
# QUDA: CG: Convergence at 3494 iterations, L2 relative residual: iterated = 3.154490e-10, true = 3.154490e-10 (requested = 3.162278e-10)
# QUDA: Solution = 2.28373
# QUDA: Reconstructed: CUDA solution = 2.0031, CPU copy = 2.0031
# QUDA: Done: 3494 iter / 3.77257 secs = 448.499 Gflops
# QUDA: time spent in reorder_spinor_fromQuda: 0.003861 secs
# Inversion done in 3494 iterations, squared residue = 8.500292e-12!
# Inversion done in 3.93e+00 sec. 

Twisted mass, kappa=0.163279 (kappa_c), mu=0.05

# QUDA: CG: 445 iterations, <r,r> = 9.569045e-20, |r|/|b| = 3.093387e-10
# QUDA: CG: Reliable updates = 4
# QUDA: CG: Convergence at 445 iterations, L2 relative residual: iterated = 3.093387e-10, true = 3.093387e-10 (requested = 3.162278e-10)
# QUDA: Solution = 2.15774
# QUDA: Reconstructed: CUDA solution = 1.75165, CPU copy = 1.75165
# QUDA: Done: 445 iter / 0.483458 secs = 449.529 Gflops
# QUDA: time spent in reorder_spinor_fromQuda: 0.003882 secs
# Inversion done in 445 iterations, squared residue = 6.757993e-08!
# Inversion done in 6.33e-01 sec. 

Comparison to correct run before new Dslash

To compare, below the same twisted mass inversions using 8e9b6a32ca1b8e4397e1f18f0800e76a9a9416c6 as a QUDA head commit (Feb 15)

Twisted mass, kappa=0.163279 (kappa_c), mu=0.0005

# QUDA: CG: 6470 iterations, <r,r> = 9.876873e-20, |r|/|b| = 3.142749e-10
# QUDA: CG: Reliable updates = 4
# QUDA: CG: Convergence at 6470 iterations, L2 relative residual: iterated = 3.142749e-10, true = 3.142749e-10 (requested = 3.162278e-10)
# QUDA: Solution = 2.55777
# QUDA: Reconstructed: CUDA solution = 2.55144, CPU copy = 2.55144
# QUDA: Saving 236 sets of cached parameters to /qbigwork/bartek/quda_resources/kepler_405d5bf1ac9cdbccbc11ac957e07d822065ac36e_gdr0_p2p3/tunecach
e.tsv
# QUDA: Done: 6470 iter / 9.99503 secs = 313.288 Gflops
# QUDA: time spent in reorder_spinor_fromQuda: 0.003511 secs
# Inversion done in 6470 iterations, squared residue = 9.876873e-20!
# Inversion done in 1.05e+01 sec. 

Twisted mass, kappa=0.163279 (kappa_c), mu=0.005

# QUDA: CG: 3495 iterations, <r,r> = 9.645039e-20, |r|/|b| = 3.105646e-10
# QUDA: CG: Reliable updates = 4
# QUDA: CG: Convergence at 3495 iterations, L2 relative residual: iterated = 3.105646e-10, true = 3.105646e-10 (requested = 3.162278e-10)
# QUDA: Solution = 2.28373
# QUDA: Reconstructed: CUDA solution = 2.0031, CPU copy = 2.0031
# QUDA: Done: 3495 iter / 3.73878 secs = 452.672 Gflops
# QUDA: time spent in reorder_spinor_fromQuda: 0.003751 secs
# Inversion done in 3495 iterations, squared residue = 9.645039e-20!
# Inversion done in 3.89e+00 sec. 

Twisted mass, kappa=0.163279 (kappa_c), mu=0.05

# QUDA: CG: 445 iterations, <r,r> = 9.564262e-20, |r|/|b| = 3.092614e-10
# QUDA: CG: Reliable updates = 4
# QUDA: CG: Convergence at 445 iterations, L2 relative residual: iterated = 3.092614e-10, true = 3.092614e-10 (requested = 3.162278e-10)
# QUDA: Solution = 2.15774
# QUDA: Reconstructed: CUDA solution = 1.75134, CPU copy = 1.75134
# QUDA: Done: 445 iter / 0.476031 secs = 456.468 Gflops
# QUDA: time spent in reorder_spinor_fromQuda: 0.003757 secs
# Inversion done in 445 iterations, squared residue = 9.564262e-20!
# Inversion done in 6.32e-01 sec. 

As everything is extremely similar and it's only the plain twisted mass case which differs in a mass-dependent way, perhaps there's an issue with the gamma basis that the twisted mass term is applied in in the plain twisted mass operator only?

maddyscientist commented 5 years ago

Fixed with commit 625278d5fe510e3d0530942032a2e0d42c3e9ca1.

kostrzewa commented 5 years ago

I can confirm that this is addressed by #794 thanks!

maddyscientist commented 5 years ago

Thanks for confirming this @kostrzewa. The issue came up due to a hole in the solver testing: the default did not test solving the full system with e/o preconditioning. I will ensure this hole is plugged going forward though so we include this by default in the regression testing.