Closed mathiaswagner closed 9 years ago
Just tried with tifr-reduc branch 6e6a6813ad7013e8a97c2df43c07a31a2e3bee07, i.e. before the 0.7 merge and for the hisq example I get the same error.
Is this still a problem? Can we close this?
Carleton reproduced this issue using MILC 7.7.12 using the current version of quda-0.7:
I compiled with maximum verbosity. Here are some more details. The multishift cg first improves |r|/|b|. Then the residual blows up.
Carleton
MultiShift CG: 0 iterations,= 1.990714e+06, |r|/|b| = 1.000000e+00 MultiShift CG: 1 iterations, = 4.146271e+05, |r|/|b| = 4.563777e-01 MultiShift CG: Shift 1 converged after 2 iterations MultiShift CG: Shift 2 converged after 2 iterations MultiShift CG: Shift 3 converged after 2 iterations MultiShift CG: Shift 4 converged after 2 iterations MultiShift CG: Shift 5 converged after 2 iterations MultiShift CG: Shift 6 converged after 2 iterations MultiShift CG: Shift 7 converged after 2 iterations MultiShift CG: 2 iterations, = 1.990714e+06, |r|/|b| = 1.000000e+00 MultiShift CG: Shift 1 converged after 3 iterations MultiShift CG: Shift 2 converged after 3 iterations MultiShift CG: Shift 3 converged after 3 iterations MultiShift CG: 3 iterations, = 1.990714e+06, |r|/|b| = 1.000000e+00 MultiShift CG: Shift 1 converged after 4 iterations MultiShift CG: Shift 2 converged after 4 iterations MultiShift CG: 4 iterations, = 1.990714e+06, |r|/|b| = 1.000000e+00 MultiShift CG: Shift 1 converged after 5 iterations MultiShift CG: 5 iterations, = 1.990714e+06, |r|/|b| = 1.000000e+00 MultiShift CG: 6 iterations, = 1.990714e+06, |r|/|b| = 1.000000e+00 MultiShift CG: 7 iterations, = 4.146271e+05, |r|/|b| = 4.563777e-01 MultiShift CG: 8 iterations, = 4.016667e+33, |r|/|b| = 4.491883e+13 MultiShift CG: 9 iterations, = nan, |r|/|b| = nan
This issue has now been fixed as of 2a96da3f2da60b6f076aba78fb63a53f7c9a84bf.
I tries to run 'su3_leapfrog' and 'su3_rhmc_hisq' using quad 0.7 (136a4ca8d74f6f87f17a286a17a29e5fc6d0130c) and milc from lattice/milc.
It seems the multishift inverter does not converge.
For the hisq test from milc (ks_imp_rhmc/su3_rhmc_hisq.1.sample-in) the solver diverges.
For asqtad (su3_leapfrog) I think the problems is similar, at least that is my first guess.
Here is a short clip from the output: