lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
291 stars 97 forks source link

calling updateMultigridQuda with `setup_maxiter_refresh[level] != 0` leads to subsequent solve erroring #1170

Closed kostrzewa closed 3 years ago

kostrzewa commented 3 years ago

I'm currently in the process of getting MG setup evolution set up correctly in tmLQCD and have hit upon an issue which causes the next solve after a setup refresh to fail. Doing updateMultigridQuda without refresh (just updating parameters) instead works like a charm. Also, resetting the MG setup mid-trajectory works as expected and does not lead to any errors in the solves.

I've implemented various functionalities to track the state of the QUDA gauge field along the trajectory and to synchronise the state of the MG with this state. I also keep track of the parameters that we used to generate the setup and call updateMultigridQuda when required and at parameter changes (with setup_maxiter_refresh[level] = 0).

As discussed via e-mail, I had missed to also use setup_maxiter_refresh[level] at appropriate intervals to actually evolve the null vectors too, forcing me to reset the setup fully every 0.2 molecular dynamic time units or so.

I've now implemented the refresh:

    } else if ( check_quda_mg_setup_state(&quda_mg_setup_state, &quda_gauge_state, &quda_input) == TM_QUDA_MG_SETUP_REFRESH ) {
      tm_debug_printf(0,0,"# TM_QUDA: Refreshing MG Preconditioner Setup for gauge %f\n", quda_gauge_state.gauge_id);
      double atime = gettime();
      for(int level = 0; level < (quda_input.mg_n_level-1); level++){
        quda_mg_param.setup_maxiter_refresh[level] = quda_input.mg_setup_maxiter_refresh[level];
      }
      // update the parameters AND refresh the setup
      updateMultigridQuda(quda_mg_preconditioner, &quda_mg_param);
      set_quda_mg_setup_state(&quda_mg_setup_state, &quda_gauge_state);
      // reset refresh iterations to zero such that the next call
      // to updateMultigridQuda only updates parameters and coarse
      // operator(s)
      for(int level = 0; level < (quda_input.mg_n_level-1); level++){
        quda_mg_param.setup_maxiter_refresh[level] = 0;
      }
      tm_debug_printf(0,1,"# TM_QUDA: MG Preconditioner Setup Refresh took %.3f seconds\n", gettime()-atime);
  }

but it unfortunately causes the subsequent solve to error out (this is with the 1.1.x branch):

# TM_QUDA: Refreshing MG Preconditioner Setup for gauge 0.041667
# TM_QUDA: MG Preconditioner Setup Refresh took 0.719 seconds
# TM_QUDA: time spent in reorder_spinor_eo_toQuda: 0.002584 secs
ERROR: Unsupported preconditioner 15
 (rank 0, host cassiopeia, /home/bartek/code/quda-1.1.x/lib/inv_gcr_quda.cpp:192 in GCR())
       last kernel called was (name=N4quda4blas5Norm2IddEE,volume=8x16x16x32,aux=vol=65536,stride=65536,precision=8,order=2,Ns=4,Nc=3,TwistFlavour=1,nParity=1)

While I first observed this in the feature/ndeg-twisted-clover branch of PR:https://github.com/lattice/quda/pull/1121 (as we rely on the tm_rho parameter introduced there for our mass preconditioning with twisted-clover fermions), I can reproduce the issue with the 1.1.x branch when running non-clover twisted mass HMC with the MG solver.

With the GK or ndeg-twisted-clover branch, QUDA_MG_INVERTER == 14, but otherwise the error is identical.

I'm a bit surprised that a call to generateNullVectors with refresh == true causes this behaviour. I will post some verbose output as a comment in a second.

kostrzewa commented 3 years ago

Here's what happens at high verbosity. It would be extremely helpful if you could help me identify what I'm doing incorrectly (if you have the time, that is). For brevity I'm just doing a 2-level solve.

# TM_QUDA: Refreshing MG Preconditioner Setup for gauge 0.041667
MG level 0 (GPU): Resetting level 0
MG level 0 (GPU): Creating a CG solver
MG level 0 (GPU): Running vectors setup on level 0 iter 1 of 1
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.415772e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.473850e-04, |r|/|b| = 3.189048e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 7.606632e-05, |r|/|b| = 1.492285e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 2.555931e-05, |r|/|b| = 8.650279e-02
MG level 0 (GPU): CG:     4 iterations, <r,r> = 1.077483e-05, |r|/|b| = 5.616436e-02
MG level 0 (GPU): CG:     5 iterations, <r,r> = 5.517213e-06, |r|/|b| = 4.018976e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 3.218146e-06, |r|/|b| = 3.069435e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 2.071170e-06, |r|/|b| = 2.462427e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 1.453459e-06, |r|/|b| = 2.062799e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 1.089899e-06, |r|/|b| = 1.786276e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 8.685712e-07, |r|/|b| = 1.594624e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 7.322855e-07, |r|/|b| = 1.464184e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 6.442580e-07, |r|/|b| = 1.373363e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 5.850992e-07, |r|/|b| = 1.308791e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 5.456134e-07, |r|/|b| = 1.263857e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 5.187789e-07, |r|/|b| = 1.232386e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 5.026046e-07, |r|/|b| = 1.213022e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 4.910286e-07, |r|/|b| = 1.198972e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 4.840505e-07, |r|/|b| = 1.190422e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 4.804423e-07, |r|/|b| = 1.185977e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 4.782896e-07, |r|/|b| = 1.183317e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 1.183317e-02, true = 1.183318e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.95676
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.415853e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.486970e-04, |r|/|b| = 3.195027e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 7.662786e-05, |r|/|b| = 1.497765e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 2.600925e-05, |r|/|b| = 8.725981e-02
MG level 0 (GPU): CG:     4 iterations, <r,r> = 1.121451e-05, |r|/|b| = 5.729815e-02
MG level 0 (GPU): CG:     5 iterations, <r,r> = 5.962889e-06, |r|/|b| = 4.178099e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 3.689192e-06, |r|/|b| = 3.286367e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 2.553792e-06, |r|/|b| = 2.734281e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 1.952130e-06, |r|/|b| = 2.390588e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 1.595112e-06, |r|/|b| = 2.160956e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 1.363965e-06, |r|/|b| = 1.998260e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 1.202337e-06, |r|/|b| = 1.876132e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 1.080153e-06, |r|/|b| = 1.778251e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 9.812346e-07, |r|/|b| = 1.694872e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 8.971789e-07, |r|/|b| = 1.620653e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 8.278756e-07, |r|/|b| = 1.556800e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 7.722303e-07, |r|/|b| = 1.503571e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 7.240953e-07, |r|/|b| = 1.455956e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 6.808996e-07, |r|/|b| = 1.411861e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 6.427873e-07, |r|/|b| = 1.371779e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 6.120593e-07, |r|/|b| = 1.338589e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 1.338589e-02, true = 1.338590e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.953546
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.411352e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.480088e-04, |r|/|b| = 3.193977e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 7.629460e-05, |r|/|b| = 1.495490e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 2.585998e-05, |r|/|b| = 8.706645e-02
MG level 0 (GPU): CG:     4 iterations, <r,r> = 1.109138e-05, |r|/|b| = 5.702030e-02
MG level 0 (GPU): CG:     5 iterations, <r,r> = 5.805135e-06, |r|/|b| = 4.125180e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 3.509121e-06, |r|/|b| = 3.207273e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 2.357897e-06, |r|/|b| = 2.629052e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 1.750515e-06, |r|/|b| = 2.265268e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 1.410325e-06, |r|/|b| = 2.033276e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 1.213139e-06, |r|/|b| = 1.885784e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 1.085980e-06, |r|/|b| = 1.784216e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 9.991394e-07, |r|/|b| = 1.711393e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 9.364946e-07, |r|/|b| = 1.656873e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 8.848718e-07, |r|/|b| = 1.610560e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 8.405157e-07, |r|/|b| = 1.569675e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 7.969712e-07, |r|/|b| = 1.528474e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 7.594812e-07, |r|/|b| = 1.492091e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 7.282033e-07, |r|/|b| = 1.461043e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 6.982887e-07, |r|/|b| = 1.430719e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 6.765730e-07, |r|/|b| = 1.408296e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 1.408296e-02, true = 1.408300e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.945804
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.418140e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.483823e-04, |r|/|b| = 3.192516e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 7.677604e-05, |r|/|b| = 1.498711e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 2.630578e-05, |r|/|b| = 8.772648e-02
MG level 0 (GPU): CG:     4 iterations, <r,r> = 1.152067e-05, |r|/|b| = 5.805557e-02
MG level 0 (GPU): CG:     5 iterations, <r,r> = 6.241642e-06, |r|/|b| = 4.273213e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 3.954818e-06, |r|/|b| = 3.401483e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 2.820891e-06, |r|/|b| = 2.872753e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 2.211021e-06, |r|/|b| = 2.543323e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 1.849546e-06, |r|/|b| = 2.326149e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 1.607949e-06, |r|/|b| = 2.168908e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 1.419443e-06, |r|/|b| = 2.037812e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 1.256325e-06, |r|/|b| = 1.917150e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 1.105689e-06, |r|/|b| = 1.798546e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 9.715971e-07, |r|/|b| = 1.685964e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 8.537822e-07, |r|/|b| = 1.580442e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 7.601416e-07, |r|/|b| = 1.491256e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 6.937803e-07, |r|/|b| = 1.424676e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 6.499072e-07, |r|/|b| = 1.378894e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 6.252750e-07, |r|/|b| = 1.352510e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 6.170309e-07, |r|/|b| = 1.343564e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 1.343564e-02, true = 1.343566e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.943973
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.431469e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.547606e-04, |r|/|b| = 3.215346e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 8.117440e-05, |r|/|b| = 1.538047e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 3.064015e-05, |r|/|b| = 9.449426e-02
MG level 0 (GPU): CG:     4 iterations, <r,r> = 1.620615e-05, |r|/|b| = 6.872266e-02
MG level 0 (GPU): CG:     5 iterations, <r,r> = 1.158547e-05, |r|/|b| = 5.810543e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 1.008284e-05, |r|/|b| = 5.420650e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 9.655041e-06, |r|/|b| = 5.304409e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 9.410759e-06, |r|/|b| = 5.236876e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 8.799453e-06, |r|/|b| = 5.063931e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 7.706362e-06, |r|/|b| = 4.738978e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 6.461675e-06, |r|/|b| = 4.339429e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 5.385764e-06, |r|/|b| = 3.961719e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 4.597116e-06, |r|/|b| = 3.660182e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 4.051004e-06, |r|/|b| = 3.435907e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 3.679989e-06, |r|/|b| = 3.274788e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 3.422080e-06, |r|/|b| = 3.157948e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 3.240665e-06, |r|/|b| = 3.073103e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 3.093589e-06, |r|/|b| = 3.002557e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 2.960500e-06, |r|/|b| = 2.937261e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 2.828412e-06, |r|/|b| = 2.870988e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 2.870988e-02, true = 2.870992e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.864173
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.441649e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.574750e-04, |r|/|b| = 3.222846e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 8.095990e-05, |r|/|b| = 1.533740e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 2.903819e-05, |r|/|b| = 9.185472e-02
MG level 0 (GPU): CG:     4 iterations, <r,r> = 1.378710e-05, |r|/|b| = 6.329264e-02
MG level 0 (GPU): CG:     5 iterations, <r,r> = 8.442372e-06, |r|/|b| = 4.952779e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 6.249146e-06, |r|/|b| = 4.261152e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 5.281081e-06, |r|/|b| = 3.917221e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 4.877421e-06, |r|/|b| = 3.764539e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 4.682087e-06, |r|/|b| = 3.688387e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 4.542513e-06, |r|/|b| = 3.632995e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 4.380975e-06, |r|/|b| = 3.567813e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 4.142892e-06, |r|/|b| = 3.469513e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 3.814293e-06, |r|/|b| = 3.329076e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 3.444547e-06, |r|/|b| = 3.163609e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 3.085979e-06, |r|/|b| = 2.994423e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 2.774346e-06, |r|/|b| = 2.839207e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 2.536806e-06, |r|/|b| = 2.714940e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 2.372717e-06, |r|/|b| = 2.625667e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 2.275398e-06, |r|/|b| = 2.571256e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 2.220770e-06, |r|/|b| = 2.540204e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 2.540204e-02, true = 2.540206e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.874018
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.426953e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.524872e-04, |r|/|b| = 3.207138e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 7.887836e-05, |r|/|b| = 1.517137e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 2.795180e-05, |r|/|b| = 9.031311e-02
MG level 0 (GPU): CG:     4 iterations, <r,r> = 1.300951e-05, |r|/|b| = 6.161357e-02
MG level 0 (GPU): CG:     5 iterations, <r,r> = 7.714160e-06, |r|/|b| = 4.744498e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 5.469082e-06, |r|/|b| = 3.994875e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 4.423053e-06, |r|/|b| = 3.592585e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 3.947591e-06, |r|/|b| = 3.394002e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 3.745893e-06, |r|/|b| = 3.306158e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 3.681274e-06, |r|/|b| = 3.277517e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 3.711079e-06, |r|/|b| = 3.290759e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 3.825042e-06, |r|/|b| = 3.340904e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 3.953741e-06, |r|/|b| = 3.396644e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 4.054561e-06, |r|/|b| = 3.439678e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 4.055423e-06, |r|/|b| = 3.440044e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 3.912085e-06, |r|/|b| = 3.378704e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 3.655480e-06, |r|/|b| = 3.266015e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 3.348481e-06, |r|/|b| = 3.125863e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 3.058620e-06, |r|/|b| = 2.987505e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 2.820891e-06, |r|/|b| = 2.869056e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 2.869056e-02, true = 2.869062e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.83524
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.468251e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.684180e-04, |r|/|b| = 3.259231e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 9.084602e-05, |r|/|b| = 1.618444e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 3.862323e-05, |r|/|b| = 1.055283e-01
MG level 0 (GPU): CG:     4 iterations, <r,r> = 2.330357e-05, |r|/|b| = 8.197018e-02
MG level 0 (GPU): CG:     5 iterations, <r,r> = 1.821976e-05, |r|/|b| = 7.247964e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 1.606278e-05, |r|/|b| = 6.805422e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 1.472696e-05, |r|/|b| = 6.516303e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 1.352144e-05, |r|/|b| = 6.243903e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 1.205833e-05, |r|/|b| = 5.896419e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 1.056967e-05, |r|/|b| = 5.520461e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 9.314964e-06, |r|/|b| = 5.182452e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 8.290683e-06, |r|/|b| = 4.889223e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 7.475559e-06, |r|/|b| = 4.642656e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 6.764431e-06, |r|/|b| = 4.416318e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 6.119077e-06, |r|/|b| = 4.200371e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 5.560193e-06, |r|/|b| = 4.003959e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 5.132166e-06, |r|/|b| = 3.846760e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 4.835700e-06, |r|/|b| = 3.734000e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 4.655675e-06, |r|/|b| = 3.663836e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 4.495180e-06, |r|/|b| = 3.600131e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 3.600131e-02, true = 3.600130e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.806834
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.435635e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.549010e-04, |r|/|b| = 3.214031e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 8.094229e-05, |r|/|b| = 1.534915e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 2.979914e-05, |r|/|b| = 9.313187e-02
MG level 0 (GPU): CG:     4 iterations, <r,r> = 1.466109e-05, |r|/|b| = 6.532502e-02
MG level 0 (GPU): CG:     5 iterations, <r,r> = 9.327396e-06, |r|/|b| = 5.210467e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 7.049009e-06, |r|/|b| = 4.529607e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 5.841067e-06, |r|/|b| = 4.123278e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 5.020746e-06, |r|/|b| = 3.822791e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 4.351074e-06, |r|/|b| = 3.558727e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 3.798551e-06, |r|/|b| = 3.325106e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 3.393211e-06, |r|/|b| = 3.142692e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 3.158355e-06, |r|/|b| = 3.031984e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 3.040242e-06, |r|/|b| = 2.974751e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 3.008993e-06, |r|/|b| = 2.959423e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 3.013799e-06, |r|/|b| = 2.961785e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 3.027376e-06, |r|/|b| = 2.968449e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 3.035586e-06, |r|/|b| = 2.972472e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 3.055048e-06, |r|/|b| = 2.981985e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 3.059853e-06, |r|/|b| = 2.984329e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 3.029884e-06, |r|/|b| = 2.969679e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 2.969679e-02, true = 2.969687e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.856089
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.465583e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.676297e-04, |r|/|b| = 3.256995e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 8.993747e-05, |r|/|b| = 1.610950e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 3.855081e-05, |r|/|b| = 1.054699e-01
MG level 0 (GPU): CG:     4 iterations, <r,r> = 2.427993e-05, |r|/|b| = 8.370194e-02
MG level 0 (GPU): CG:     5 iterations, <r,r> = 2.015186e-05, |r|/|b| = 7.625519e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 1.828296e-05, |r|/|b| = 7.263318e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 1.648314e-05, |r|/|b| = 6.896549e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 1.469087e-05, |r|/|b| = 6.510819e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 1.307212e-05, |r|/|b| = 6.141646e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 1.178030e-05, |r|/|b| = 5.830289e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 1.073336e-05, |r|/|b| = 5.565186e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 9.643279e-06, |r|/|b| = 5.275021e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 8.529039e-06, |r|/|b| = 4.960916e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 7.568382e-06, |r|/|b| = 4.673189e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 6.859863e-06, |r|/|b| = 4.449073e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 6.406043e-06, |r|/|b| = 4.299389e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 6.140566e-06, |r|/|b| = 4.209359e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 5.967320e-06, |r|/|b| = 4.149555e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 5.820692e-06, |r|/|b| = 4.098257e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 5.653468e-06, |r|/|b| = 4.038958e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 4.038958e-02, true = 4.038945e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.76854
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.480424e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.654630e-04, |r|/|b| = 3.240452e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 8.729353e-05, |r|/|b| = 1.583707e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 3.489969e-05, |r|/|b| = 1.001370e-01
MG level 0 (GPU): CG:     4 iterations, <r,r> = 1.983248e-05, |r|/|b| = 7.548704e-02
MG level 0 (GPU): CG:     5 iterations, <r,r> = 1.477790e-05, |r|/|b| = 6.516139e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 1.251416e-05, |r|/|b| = 5.996319e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 1.102989e-05, |r|/|b| = 5.629496e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 9.766813e-06, |r|/|b| = 5.297370e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 8.798359e-06, |r|/|b| = 5.027878e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 8.206809e-06, |r|/|b| = 4.855915e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 7.811110e-06, |r|/|b| = 4.737403e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 7.556441e-06, |r|/|b| = 4.659535e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 7.353121e-06, |r|/|b| = 4.596421e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 7.168840e-06, |r|/|b| = 4.538458e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 6.920279e-06, |r|/|b| = 4.459085e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 6.655154e-06, |r|/|b| = 4.372834e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 6.432198e-06, |r|/|b| = 4.298962e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 6.255092e-06, |r|/|b| = 4.239365e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 6.090345e-06, |r|/|b| = 4.183164e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 5.973589e-06, |r|/|b| = 4.142873e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 4.142873e-02, true = 4.142875e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.75059
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.488732e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.673737e-04, |r|/|b| = 3.245041e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 8.685215e-05, |r|/|b| = 1.577816e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 3.425324e-05, |r|/|b| = 9.908707e-02
MG level 0 (GPU): CG:     4 iterations, <r,r> = 1.870830e-05, |r|/|b| = 7.322903e-02
MG level 0 (GPU): CG:     5 iterations, <r,r> = 1.312032e-05, |r|/|b| = 6.132512e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 1.068263e-05, |r|/|b| = 5.533567e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 9.552283e-06, |r|/|b| = 5.232627e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 9.080882e-06, |r|/|b| = 5.101879e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 8.884703e-06, |r|/|b| = 5.046469e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 8.806064e-06, |r|/|b| = 5.024086e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 8.727180e-06, |r|/|b| = 5.001533e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 8.633664e-06, |r|/|b| = 4.974664e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 8.452427e-06, |r|/|b| = 4.922173e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 8.210143e-06, |r|/|b| = 4.851115e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 7.921131e-06, |r|/|b| = 4.764966e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 7.566595e-06, |r|/|b| = 4.657109e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 7.165689e-06, |r|/|b| = 4.532055e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 6.795487e-06, |r|/|b| = 4.413432e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 6.505187e-06, |r|/|b| = 4.318134e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 6.241751e-06, |r|/|b| = 4.229796e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 4.229796e-02, true = 4.229802e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.731077
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.477460e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.688538e-04, |r|/|b| = 3.256837e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 8.921183e-05, |r|/|b| = 1.601696e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 3.659460e-05, |r|/|b| = 1.025835e-01
MG level 0 (GPU): CG:     4 iterations, <r,r> = 2.119482e-05, |r|/|b| = 7.806993e-02
MG level 0 (GPU): CG:     5 iterations, <r,r> = 1.606979e-05, |r|/|b| = 6.797889e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 1.375158e-05, |r|/|b| = 6.288473e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 1.204877e-05, |r|/|b| = 5.886272e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 1.040248e-05, |r|/|b| = 5.469372e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 9.131219e-06, |r|/|b| = 5.124285e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 8.413518e-06, |r|/|b| = 4.918784e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 8.074270e-06, |r|/|b| = 4.818596e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 7.955023e-06, |r|/|b| = 4.782882e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 7.894514e-06, |r|/|b| = 4.764657e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 7.805537e-06, |r|/|b| = 4.737730e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 7.619704e-06, |r|/|b| = 4.680993e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 7.389647e-06, |r|/|b| = 4.609786e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 7.136442e-06, |r|/|b| = 4.530121e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 6.904958e-06, |r|/|b| = 4.456044e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 6.671060e-06, |r|/|b| = 4.379922e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 6.433484e-06, |r|/|b| = 4.301224e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 4.301224e-02, true = 4.301223e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.744694
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.517301e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.788056e-04, |r|/|b| = 3.281734e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 9.577182e-05, |r|/|b| = 1.650115e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 4.175074e-05, |r|/|b| = 1.089500e-01
MG level 0 (GPU): CG:     4 iterations, <r,r> = 2.584033e-05, |r|/|b| = 8.571251e-02
MG level 0 (GPU): CG:     5 iterations, <r,r> = 2.089584e-05, |r|/|b| = 7.707706e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 1.892263e-05, |r|/|b| = 7.334760e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 1.738880e-05, |r|/|b| = 7.031209e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 1.539929e-05, |r|/|b| = 6.616762e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 1.297634e-05, |r|/|b| = 6.073952e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 1.065324e-05, |r|/|b| = 5.503463e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 8.803584e-06, |r|/|b| = 5.002936e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 7.672758e-06, |r|/|b| = 4.670581e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 7.100915e-06, |r|/|b| = 4.493165e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 6.834871e-06, |r|/|b| = 4.408191e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 6.619883e-06, |r|/|b| = 4.338308e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 6.403139e-06, |r|/|b| = 4.266696e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 6.136748e-06, |r|/|b| = 4.176999e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 5.837459e-06, |r|/|b| = 4.073870e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 5.552057e-06, |r|/|b| = 3.973033e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 5.337313e-06, |r|/|b| = 3.895440e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 3.895440e-02, true = 3.895427e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.782228
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.512275e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.845182e-04, |r|/|b| = 3.308752e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 9.998256e-05, |r|/|b| = 1.687205e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 4.417075e-05, |r|/|b| = 1.121433e-01
MG level 0 (GPU): CG:     4 iterations, <r,r> = 2.697089e-05, |r|/|b| = 8.763012e-02
MG level 0 (GPU): CG:     5 iterations, <r,r> = 2.101919e-05, |r|/|b| = 7.735952e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 1.859250e-05, |r|/|b| = 7.275698e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 1.732469e-05, |r|/|b| = 7.023256e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 1.606778e-05, |r|/|b| = 6.763689e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 1.447033e-05, |r|/|b| = 6.418668e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 1.274598e-05, |r|/|b| = 6.024102e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 1.136563e-05, |r|/|b| = 5.688562e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 1.046826e-05, |r|/|b| = 5.459376e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 9.894267e-06, |r|/|b| = 5.307593e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 9.414305e-06, |r|/|b| = 5.177259e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 8.941602e-06, |r|/|b| = 5.045607e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 8.527582e-06, |r|/|b| = 4.927410e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 8.241247e-06, |r|/|b| = 4.843979e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 8.053755e-06, |r|/|b| = 4.788560e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 7.869499e-06, |r|/|b| = 4.733466e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 7.681976e-06, |r|/|b| = 4.676729e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 4.676729e-02, true = 4.676743e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.710388
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.518015e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.876900e-04, |r|/|b| = 3.319659e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 1.056359e-04, |r|/|b| = 1.732834e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 5.279989e-05, |r|/|b| = 1.225089e-01
MG level 0 (GPU): CG:     4 iterations, <r,r> = 3.612400e-05, |r|/|b| = 1.013326e-01
MG level 0 (GPU): CG:     5 iterations, <r,r> = 2.756550e-05, |r|/|b| = 8.851850e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 2.170835e-05, |r|/|b| = 7.855332e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 1.829710e-05, |r|/|b| = 7.211777e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 1.670225e-05, |r|/|b| = 6.890308e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 1.606671e-05, |r|/|b| = 6.757944e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 1.553030e-05, |r|/|b| = 6.644175e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 1.482810e-05, |r|/|b| = 6.492230e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 1.394458e-05, |r|/|b| = 6.295843e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 1.288615e-05, |r|/|b| = 6.052193e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 1.179568e-05, |r|/|b| = 5.790453e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 1.081418e-05, |r|/|b| = 5.544315e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 1.008847e-05, |r|/|b| = 5.355052e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 9.615533e-06, |r|/|b| = 5.228027e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 9.308357e-06, |r|/|b| = 5.143843e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 9.036890e-06, |r|/|b| = 5.068281e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 8.774710e-06, |r|/|b| = 4.994219e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 4.994219e-02, true = 4.994230e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.68543
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.584700e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 4.006185e-04, |r|/|b| = 3.343021e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 1.112484e-04, |r|/|b| = 1.761653e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 5.694861e-05, |r|/|b| = 1.260420e-01
MG level 0 (GPU): CG:     4 iterations, <r,r> = 3.944964e-05, |r|/|b| = 1.049047e-01
MG level 0 (GPU): CG:     5 iterations, <r,r> = 3.112924e-05, |r|/|b| = 9.318754e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 2.545962e-05, |r|/|b| = 8.427516e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 2.149017e-05, |r|/|b| = 7.742719e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 1.833892e-05, |r|/|b| = 7.152544e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 1.613554e-05, |r|/|b| = 6.709117e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 1.468180e-05, |r|/|b| = 6.399754e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 1.347177e-05, |r|/|b| = 6.130360e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 1.246630e-05, |r|/|b| = 5.897154e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 1.172565e-05, |r|/|b| = 5.719291e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 1.124890e-05, |r|/|b| = 5.601814e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 1.083001e-05, |r|/|b| = 5.496522e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 1.039039e-05, |r|/|b| = 5.383808e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 9.947140e-06, |r|/|b| = 5.267721e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 9.545958e-06, |r|/|b| = 5.160400e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 9.186550e-06, |r|/|b| = 5.062323e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 8.808518e-06, |r|/|b| = 4.957070e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 4.957070e-02, true = 4.957068e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.688792
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.529686e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.865527e-04, |r|/|b| = 3.309301e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 1.043769e-04, |r|/|b| = 1.719627e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 5.293033e-05, |r|/|b| = 1.224572e-01
MG level 0 (GPU): CG:     4 iterations, <r,r> = 3.957046e-05, |r|/|b| = 1.058809e-01
MG level 0 (GPU): CG:     5 iterations, <r,r> = 3.637549e-05, |r|/|b| = 1.015164e-01
MG level 0 (GPU): CG:     6 iterations, <r,r> = 3.299860e-05, |r|/|b| = 9.668958e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 2.677938e-05, |r|/|b| = 8.710282e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 2.085346e-05, |r|/|b| = 7.686366e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 1.670061e-05, |r|/|b| = 6.878569e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 1.393420e-05, |r|/|b| = 6.283085e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 1.213966e-05, |r|/|b| = 5.864556e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 1.102744e-05, |r|/|b| = 5.589454e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 1.024783e-05, |r|/|b| = 5.388252e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 9.575458e-06, |r|/|b| = 5.208489e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 8.989132e-06, |r|/|b| = 5.046506e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 8.512460e-06, |r|/|b| = 4.910882e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 8.130221e-06, |r|/|b| = 4.799358e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 7.810721e-06, |r|/|b| = 4.704110e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 7.526341e-06, |r|/|b| = 4.617680e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 7.298276e-06, |r|/|b| = 4.547179e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 4.547179e-02, true = 4.547183e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.722373
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.587570e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.940723e-04, |r|/|b| = 3.314269e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 1.025480e-04, |r|/|b| = 1.690688e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 4.698173e-05, |r|/|b| = 1.144364e-01
MG level 0 (GPU): CG:     4 iterations, <r,r> = 3.149483e-05, |r|/|b| = 9.369565e-02
MG level 0 (GPU): CG:     5 iterations, <r,r> = 2.605299e-05, |r|/|b| = 8.521746e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 2.396021e-05, |r|/|b| = 8.172315e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 2.291678e-05, |r|/|b| = 7.992388e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 2.093928e-05, |r|/|b| = 7.639777e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 1.869873e-05, |r|/|b| = 7.219479e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 1.704930e-05, |r|/|b| = 6.893712e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 1.567308e-05, |r|/|b| = 6.609628e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 1.423345e-05, |r|/|b| = 6.298757e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 1.281194e-05, |r|/|b| = 5.975953e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 1.155794e-05, |r|/|b| = 5.675969e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 1.077665e-05, |r|/|b| = 5.480771e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 1.028510e-05, |r|/|b| = 5.354316e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 9.928522e-06, |r|/|b| = 5.260682e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 9.616188e-06, |r|/|b| = 5.177275e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 9.275912e-06, |r|/|b| = 5.084849e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 8.912752e-06, |r|/|b| = 4.984317e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 4.984317e-02, true = 4.984323e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.685352
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.574495e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.946435e-04, |r|/|b| = 3.322730e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 1.085678e-04, |r|/|b| = 1.742782e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 5.582540e-05, |r|/|b| = 1.249708e-01
MG level 0 (GPU): CG:     4 iterations, <r,r> = 4.143518e-05, |r|/|b| = 1.076657e-01
MG level 0 (GPU): CG:     5 iterations, <r,r> = 3.681794e-05, |r|/|b| = 1.014898e-01
MG level 0 (GPU): CG:     6 iterations, <r,r> = 3.174267e-05, |r|/|b| = 9.423547e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 2.578800e-05, |r|/|b| = 8.493790e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 2.081193e-05, |r|/|b| = 7.630428e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 1.732898e-05, |r|/|b| = 6.962723e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 1.526141e-05, |r|/|b| = 6.534164e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 1.375235e-05, |r|/|b| = 6.202705e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 1.255929e-05, |r|/|b| = 5.927550e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 1.158736e-05, |r|/|b| = 5.693574e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 1.083458e-05, |r|/|b| = 5.505525e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 1.025217e-05, |r|/|b| = 5.355507e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 9.855063e-06, |r|/|b| = 5.250762e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 9.526239e-06, |r|/|b| = 5.162421e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 9.150829e-06, |r|/|b| = 5.059678e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 8.731332e-06, |r|/|b| = 4.942343e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 8.323457e-06, |r|/|b| = 4.825525e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 4.825525e-02, true = 4.825543e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.702056
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.591768e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 4.097869e-04, |r|/|b| = 3.377730e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 1.187555e-04, |r|/|b| = 1.818330e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 6.554084e-05, |r|/|b| = 1.350834e-01
MG level 0 (GPU): CG:     4 iterations, <r,r> = 5.191457e-05, |r|/|b| = 1.202238e-01
MG level 0 (GPU): CG:     5 iterations, <r,r> = 4.617676e-05, |r|/|b| = 1.133855e-01
MG level 0 (GPU): CG:     6 iterations, <r,r> = 3.870816e-05, |r|/|b| = 1.038119e-01
MG level 0 (GPU): CG:     7 iterations, <r,r> = 3.036909e-05, |r|/|b| = 9.195212e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 2.375923e-05, |r|/|b| = 8.133211e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 1.996600e-05, |r|/|b| = 7.455751e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 1.827672e-05, |r|/|b| = 7.133373e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 1.736862e-05, |r|/|b| = 6.953901e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 1.638613e-05, |r|/|b| = 6.754357e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 1.532811e-05, |r|/|b| = 6.532662e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 1.440165e-05, |r|/|b| = 6.332162e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 1.371661e-05, |r|/|b| = 6.179726e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 1.317252e-05, |r|/|b| = 6.055921e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 1.257077e-05, |r|/|b| = 5.915982e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 1.195297e-05, |r|/|b| = 5.768776e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 1.128519e-05, |r|/|b| = 5.605319e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 1.065832e-05, |r|/|b| = 5.447412e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 5.447412e-02, true = 5.447415e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.657687
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.593717e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 3.942544e-04, |r|/|b| = 3.312198e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 1.053333e-04, |r|/|b| = 1.712028e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 4.900409e-05, |r|/|b| = 1.167735e-01
MG level 0 (GPU): CG:     4 iterations, <r,r> = 3.146773e-05, |r|/|b| = 9.357520e-02
MG level 0 (GPU): CG:     5 iterations, <r,r> = 2.523219e-05, |r|/|b| = 8.379258e-02
MG level 0 (GPU): CG:     6 iterations, <r,r> = 2.243847e-05, |r|/|b| = 7.901775e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 2.083929e-05, |r|/|b| = 7.614993e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 1.998497e-05, |r|/|b| = 7.457270e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 1.925213e-05, |r|/|b| = 7.319264e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 1.846933e-05, |r|/|b| = 7.168919e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 1.750685e-05, |r|/|b| = 6.979625e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 1.636069e-05, |r|/|b| = 6.747282e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 1.503235e-05, |r|/|b| = 6.467576e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 1.382438e-05, |r|/|b| = 6.202273e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 1.284383e-05, |r|/|b| = 5.978267e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 1.207088e-05, |r|/|b| = 5.795589e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 1.151789e-05, |r|/|b| = 5.661279e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 1.114286e-05, |r|/|b| = 5.568349e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 1.081687e-05, |r|/|b| = 5.486291e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 1.053543e-05, |r|/|b| = 5.414449e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 5.414449e-02, true = 5.414462e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.623874
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.585488e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 4.077162e-04, |r|/|b| = 3.372134e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 1.210009e-04, |r|/|b| = 1.837046e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 7.051054e-05, |r|/|b| = 1.402339e-01
MG level 0 (GPU): CG:     4 iterations, <r,r> = 6.073137e-05, |r|/|b| = 1.301465e-01
MG level 0 (GPU): CG:     5 iterations, <r,r> = 6.126392e-05, |r|/|b| = 1.307158e-01
MG level 0 (GPU): CG:     6 iterations, <r,r> = 5.719324e-05, |r|/|b| = 1.262985e-01
MG level 0 (GPU): CG:     7 iterations, <r,r> = 4.933963e-05, |r|/|b| = 1.173070e-01
MG level 0 (GPU): CG:     8 iterations, <r,r> = 4.294556e-05, |r|/|b| = 1.094422e-01
MG level 0 (GPU): CG:     9 iterations, <r,r> = 3.888897e-05, |r|/|b| = 1.041452e-01
MG level 0 (GPU): CG:    10 iterations, <r,r> = 3.657001e-05, |r|/|b| = 1.009923e-01
MG level 0 (GPU): CG:    11 iterations, <r,r> = 3.476677e-05, |r|/|b| = 9.847093e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 3.095858e-05, |r|/|b| = 9.292153e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 2.566267e-05, |r|/|b| = 8.460125e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 2.051155e-05, |r|/|b| = 7.563540e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 1.621546e-05, |r|/|b| = 6.724973e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 1.325582e-05, |r|/|b| = 6.080359e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 1.201021e-05, |r|/|b| = 5.787634e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 1.156064e-05, |r|/|b| = 5.678279e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 1.101045e-05, |r|/|b| = 5.541514e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 1.042454e-05, |r|/|b| = 5.392056e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 5.392056e-02, true = 5.392057e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.603785
MG level 0 (GPU): Initial guess = 1
MG level 0 (GPU): Initial rhs = 0
MG level 0 (GPU): CG:     0 iterations, <r,r> = 3.694041e-03, |r|/|b| = 1.000000e+00
MG level 0 (GPU): CG:     1 iterations, <r,r> = 4.334606e-04, |r|/|b| = 3.425500e-01
MG level 0 (GPU): CG:     2 iterations, <r,r> = 1.299636e-04, |r|/|b| = 1.875685e-01
MG level 0 (GPU): CG:     3 iterations, <r,r> = 6.993460e-05, |r|/|b| = 1.375926e-01
MG level 0 (GPU): CG:     4 iterations, <r,r> = 4.899461e-05, |r|/|b| = 1.151657e-01
MG level 0 (GPU): CG:     5 iterations, <r,r> = 4.002070e-05, |r|/|b| = 1.040858e-01
MG level 0 (GPU): CG:     6 iterations, <r,r> = 3.139747e-05, |r|/|b| = 9.219269e-02
MG level 0 (GPU): CG:     7 iterations, <r,r> = 2.484584e-05, |r|/|b| = 8.201174e-02
MG level 0 (GPU): CG:     8 iterations, <r,r> = 2.212766e-05, |r|/|b| = 7.739572e-02
MG level 0 (GPU): CG:     9 iterations, <r,r> = 2.154837e-05, |r|/|b| = 7.637589e-02
MG level 0 (GPU): CG:    10 iterations, <r,r> = 2.142315e-05, |r|/|b| = 7.615367e-02
MG level 0 (GPU): CG:    11 iterations, <r,r> = 2.080274e-05, |r|/|b| = 7.504286e-02
MG level 0 (GPU): CG:    12 iterations, <r,r> = 1.955884e-05, |r|/|b| = 7.276468e-02
MG level 0 (GPU): CG:    13 iterations, <r,r> = 1.794708e-05, |r|/|b| = 6.970214e-02
MG level 0 (GPU): CG:    14 iterations, <r,r> = 1.641661e-05, |r|/|b| = 6.666393e-02
MG level 0 (GPU): CG:    15 iterations, <r,r> = 1.484377e-05, |r|/|b| = 6.339007e-02
MG level 0 (GPU): CG:    16 iterations, <r,r> = 1.325152e-05, |r|/|b| = 5.989382e-02
MG level 0 (GPU): CG:    17 iterations, <r,r> = 1.195558e-05, |r|/|b| = 5.688982e-02
MG level 0 (GPU): CG:    18 iterations, <r,r> = 1.105592e-05, |r|/|b| = 5.470746e-02
MG level 0 (GPU): CG:    19 iterations, <r,r> = 1.042294e-05, |r|/|b| = 5.311831e-02
MG level 0 (GPU): CG:    20 iterations, <r,r> = 9.922062e-06, |r|/|b| = 5.182629e-02
MG level 0 (GPU): WARNING: Exceeded maximum iterations 20
MG level 0 (GPU): CG: Reliable updates = 1
MG level 0 (GPU): CG: Convergence at 20 iterations, L2 relative residual: iterated = 5.182629e-02, true = 5.182625e-02 (requested = 5.000000e-07)
MG level 0 (GPU): Solution = 0.647158
MG level 0 (GPU): Transfer: block orthogonalizing
MG level 0 (GPU): Block Orthogonalizing 1024 blocks of 36864 length and width 24 repeating 1 times
MG level 0 (GPU): Creating coarse Dirac operator
MG level 0 (GPU): Computing Y field......
MG level 0 (GPU): Doing bi-directional link coarsening
MG level 0 (GPU): Running link coarsening on the GPU
MG level 0 (GPU): V2 = 2.457600e+04
MG level 0 (GPU): Computing TMAV
MG level 0 (GPU): AV2 = 2.457592e+04
MG level 0 (GPU): Computing forward 0 UV and VUV
MG level 0 (GPU): 0 U_max = 2.998592e+00 v_max = 1.000000e+00 uv_max = 2.998592e+00
MG level 0 (GPU): UV2[0] = 2.457608e+04
MG level 0 (GPU): Y2[4] (atomic) = 2.270624e+03
MG level 0 (GPU): Y2[4] = 2.270624e+03
MG level 0 (GPU): Computing forward 1 UV and VUV
MG level 0 (GPU): 1 U_max = 2.999494e+00 v_max = 1.000000e+00 uv_max = 2.999494e+00
MG level 0 (GPU): UV2[1] = 2.457607e+04
MG level 0 (GPU): Y2[5] (atomic) = 2.260017e+03
MG level 0 (GPU): Y2[5] = 2.260017e+03
MG level 0 (GPU): Computing forward 2 UV and VUV
MG level 0 (GPU): 2 U_max = 2.998860e+00 v_max = 1.000000e+00 uv_max = 2.998860e+00
MG level 0 (GPU): UV2[2] = 2.457604e+04
MG level 0 (GPU): Y2[6] (atomic) = 2.263143e+03
MG level 0 (GPU): Y2[6] = 2.263143e+03
MG level 0 (GPU): Computing forward 3 UV and VUV
MG level 0 (GPU): 3 U_max = 2.998606e+00 v_max = 1.000000e+00 uv_max = 2.998606e+00
MG level 0 (GPU): UV2[3] = 2.457605e+04
MG level 0 (GPU): Y2[7] (atomic) = 2.305734e+03
MG level 0 (GPU): Y2[7] = 2.305734e+03
MG level 0 (GPU): Computing backward 0 UV and VUV
MG level 0 (GPU): 0 U_max = 2.998592e+00 av_max = 9.999980e-01 uv_max = 2.998586e+00
MG level 0 (GPU): UAV2[0] = 2.457597e+04
MG level 0 (GPU): Y2[0] (atomic) = 2.270624e+03
MG level 0 (GPU): Y2[0] = 2.270623e+03
MG level 0 (GPU): Computing backward 1 UV and VUV
MG level 0 (GPU): 1 U_max = 2.999494e+00 av_max = 9.999980e-01 uv_max = 2.999488e+00
MG level 0 (GPU): UAV2[1] = 2.457597e+04
MG level 0 (GPU): Y2[1] (atomic) = 2.260018e+03
MG level 0 (GPU): Y2[1] = 2.260018e+03
MG level 0 (GPU): Computing backward 2 UV and VUV
MG level 0 (GPU): 2 U_max = 2.998860e+00 av_max = 9.999980e-01 uv_max = 2.998854e+00
MG level 0 (GPU): UAV2[2] = 2.457596e+04
MG level 0 (GPU): Y2[2] (atomic) = 2.263142e+03
MG level 0 (GPU): Y2[2] = 2.263142e+03
MG level 0 (GPU): Computing backward 3 UV and VUV
MG level 0 (GPU): 3 U_max = 2.998606e+00 av_max = 9.999980e-01 uv_max = 2.998600e+00
MG level 0 (GPU): UAV2[3] = 2.457597e+04
MG level 0 (GPU): Y2[3] (atomic) = 2.305736e+03
MG level 0 (GPU): Y2[3] = 2.305736e+03
MG level 0 (GPU): X2 = 9.615564e+03
MG level 0 (GPU): Summing diagonal contribution to coarse clover
MG level 0 (GPU): Adding mu = -2.800000e-02
MG level 0 (GPU): X2 = 4.526001e+03
MG level 0 (GPU): ....done computing Y field
MG level 0 (GPU): About to build the preconditioned coarse clover
MG level 0 (GPU): Finished building the preconditioned coarse clover
MG level 0 (GPU): About to create the preconditioned coarse op
MG level 0 (GPU): Computing Yhat field......
MG level 0 (GPU): BatchInvertMatrix (native - cuBLAS): Nc = 48, batch = 512
MG level 0 (GPU): Batched matrix inversion completed in 0.002404 seconds with GFLOPS = 188.051381
MG level 0 (GPU): Xinv = 2.128323e+05
MG level 0 (GPU): Yhat Max = 1.828813e+00
MG level 0 (GPU): Yhat[0] = 2.368256e+04 (1.617381e+00 2.261308e+00 = 4.292010e-01 x 5.268645e+00)
MG level 0 (GPU): Yhat[1] = 2.348206e+04 (1.603123e+00 2.007145e+00 = 3.809603e-01 x 5.268645e+00)
MG level 0 (GPU): Yhat[2] = 2.359746e+04 (1.862923e+00 2.059302e+00 = 3.908600e-01 x 5.268645e+00)
MG level 0 (GPU): Yhat[3] = 2.416556e+04 (1.501386e+00 1.819264e+00 = 3.453002e-01 x 5.268645e+00)
MG level 0 (GPU): Yhat[4] = 2.363839e+04 (1.734981e+00 2.260934e+00 = 4.291302e-01 x 5.268645e+00)
MG level 0 (GPU): Yhat[5] = 2.360464e+04 (1.748247e+00 2.006933e+00 = 3.809201e-01 x 5.268645e+00)
MG level 0 (GPU): Yhat[6] = 2.360325e+04 (1.754624e+00 2.059589e+00 = 3.909143e-01 x 5.268645e+00)
MG level 0 (GPU): Yhat[7] = 2.423520e+04 (1.516479e+00 1.819086e+00 = 3.452664e-01 x 5.268645e+00)
MG level 0 (GPU): ....done computing Yhat field
MG level 0 (GPU): Finished creating the preconditioned coarse op
MG level 0 (GPU): Coarse Dirac operator done
MG level 0 (GPU): Creating smoother
MG level 0 (GPU): Creating a CA-GCR solver
MG level 0 (GPU): Smoother done
MG level 0 (GPU): Checking 0 = (1 - P P^\dagger) v_k for 24 vectors
MG level 0 (GPU): Vector 0: norms v_k = 1.000000e+00 P^\dagger v_k = 9.999988e-01 (1 - P P^\dagger) v_k = 3.007756e-07, L2 relative deviation = 5.484301e-04
MG level 0 (GPU): Vector 1: norms v_k = 1.000000e+00 P^\dagger v_k = 1.000001e+00 (1 - P P^\dagger) v_k = 1.963684e-07, L2 relative deviation = 4.431347e-04
MG level 0 (GPU): Vector 2: norms v_k = 1.000000e+00 P^\dagger v_k = 1.000000e+00 (1 - P P^\dagger) v_k = 1.503158e-07, L2 relative deviation = 3.877059e-04
MG level 0 (GPU): Vector 3: norms v_k = 1.000000e+00 P^\dagger v_k = 1.000001e+00 (1 - P P^\dagger) v_k = 1.314137e-07, L2 relative deviation = 3.625102e-04
MG level 0 (GPU): Vector 4: norms v_k = 9.999999e-01 P^\dagger v_k = 9.999993e-01 (1 - P P^\dagger) v_k = 1.262644e-07, L2 relative deviation = 3.553371e-04
MG level 0 (GPU): Vector 5: norms v_k = 1.000000e+00 P^\dagger v_k = 9.999996e-01 (1 - P P^\dagger) v_k = 9.321982e-08, L2 relative deviation = 3.053192e-04
MG level 0 (GPU): Vector 6: norms v_k = 1.000000e+00 P^\dagger v_k = 9.999994e-01 (1 - P P^\dagger) v_k = 9.631330e-08, L2 relative deviation = 3.103438e-04
MG level 0 (GPU): Vector 7: norms v_k = 9.999999e-01 P^\dagger v_k = 9.999995e-01 (1 - P P^\dagger) v_k = 7.629828e-08, L2 relative deviation = 2.762215e-04
MG level 0 (GPU): Vector 8: norms v_k = 1.000000e+00 P^\dagger v_k = 1.000000e+00 (1 - P P^\dagger) v_k = 6.899130e-08, L2 relative deviation = 2.626619e-04
MG level 0 (GPU): Vector 9: norms v_k = 9.999999e-01 P^\dagger v_k = 1.000000e+00 (1 - P P^\dagger) v_k = 7.036373e-08, L2 relative deviation = 2.652616e-04
MG level 0 (GPU): Vector 10: norms v_k = 9.999999e-01 P^\dagger v_k = 9.999998e-01 (1 - P P^\dagger) v_k = 6.594827e-08, L2 relative deviation = 2.568040e-04
MG level 0 (GPU): Vector 11: norms v_k = 1.000000e+00 P^\dagger v_k = 9.999999e-01 (1 - P P^\dagger) v_k = 6.142733e-08, L2 relative deviation = 2.478454e-04
MG level 0 (GPU): Vector 12: norms v_k = 9.999999e-01 P^\dagger v_k = 1.000000e+00 (1 - P P^\dagger) v_k = 5.114581e-08, L2 relative deviation = 2.261544e-04
MG level 0 (GPU): Vector 13: norms v_k = 1.000000e+00 P^\dagger v_k = 9.999998e-01 (1 - P P^\dagger) v_k = 5.016216e-08, L2 relative deviation = 2.239691e-04
MG level 0 (GPU): Vector 14: norms v_k = 1.000000e+00 P^\dagger v_k = 9.999997e-01 (1 - P P^\dagger) v_k = 5.058488e-08, L2 relative deviation = 2.249108e-04
MG level 0 (GPU): Vector 15: norms v_k = 9.999999e-01 P^\dagger v_k = 1.000000e+00 (1 - P P^\dagger) v_k = 4.973770e-08, L2 relative deviation = 2.230195e-04
MG level 0 (GPU): Vector 16: norms v_k = 1.000000e+00 P^\dagger v_k = 1.000000e+00 (1 - P P^\dagger) v_k = 4.737835e-08, L2 relative deviation = 2.176657e-04
MG level 0 (GPU): Vector 17: norms v_k = 9.999999e-01 P^\dagger v_k = 9.999997e-01 (1 - P P^\dagger) v_k = 4.461550e-08, L2 relative deviation = 2.112238e-04
MG level 0 (GPU): Vector 18: norms v_k = 1.000000e+00 P^\dagger v_k = 9.999998e-01 (1 - P P^\dagger) v_k = 4.324543e-08, L2 relative deviation = 2.079553e-04
MG level 0 (GPU): Vector 19: norms v_k = 9.999999e-01 P^\dagger v_k = 9.999997e-01 (1 - P P^\dagger) v_k = 4.114970e-08, L2 relative deviation = 2.028539e-04
MG level 0 (GPU): Vector 20: norms v_k = 1.000000e+00 P^\dagger v_k = 1.000000e+00 (1 - P P^\dagger) v_k = 4.104327e-08, L2 relative deviation = 2.025914e-04
MG level 0 (GPU): Vector 21: norms v_k = 1.000000e+00 P^\dagger v_k = 1.000000e+00 (1 - P P^\dagger) v_k = 4.132739e-08, L2 relative deviation = 2.032914e-04
MG level 0 (GPU): Vector 22: norms v_k = 1.000000e+00 P^\dagger v_k = 9.999996e-01 (1 - P P^\dagger) v_k = 3.546492e-08, L2 relative deviation = 1.883213e-04
MG level 0 (GPU): Vector 23: norms v_k = 1.000000e+00 P^\dagger v_k = 1.000000e+00 (1 - P P^\dagger) v_k = 3.639394e-08, L2 relative deviation = 1.907720e-04
MG level 0 (GPU): Checking 0 = (1 - P^\dagger P) eta_c
MG level 0 (GPU): L2 norms 1.631923e+04 1.631923e+04 (fine tmp 1.631923e+04) MG level 0 (GPU): relative deviation = 1.442636e-04
MG level 0 (GPU): Checking 0 = (D_c - P^\dagger D P) (native coarse operator to emulated operator)
MG level 0 (GPU): L2 norms: Emulated = 1.948888e+03, Native = 1.963072e+03, relative deviation = 2.257604e-06
MG level 0 (GPU): Checking normality of preconditioned operator
MG level 0 (GPU): Smoother normal operator test (eta^dag M^dag M eta): real=4.473352e+03 imag=-3.656783e-07, relative imaginary deviation=8.174592e-11
MG level 0 (GPU): Checking normality of residual operator
MG level 0 (GPU): Normal operator test (eta^dag M^dag M eta): real=4.722205e+03 imag=-6.390503e-07, relative imaginary deviation=1.353288e-10
MG level 1 (GPU): Creating level 1
MG level 1 (GPU): Creating smoother
MG level 1 (GPU): Smoother done
MG level 1 (GPU): Setup of level 1 done
MG level 0 (GPU): Creating coarse solver wrapper
MG level 0 (GPU): Creating a CA-GCR solver
MG level 0 (GPU): Assigned coarse solver to preconditioned GCR solver
MG level 0 (GPU): Coarse solver wrapper done
MG level 0 (GPU): Setup of level 0 done
# TM_QUDA: MG Preconditioner Setup Refresh took 2.132 seconds
# TM_QUDA: time spent in reorder_spinor_eo_toQuda: 0.003027 secs
ERROR: Unsupported preconditioner 15
 (rank 0, host cassiopeia, /home/bartek/code/quda-1.1.x/lib/inv_gcr_quda.cpp:192 in GCR())
       last kernel called was (name=N4quda4blas5Norm2IddEE,volume=8x16x16x32,aux=vol=65536,stride=65536,precision=8,order=2,Ns=4,Nc=3,TwistFlavour=1,nParity=1)
weinbe2 commented 3 years ago

Oi, well, this is confusing. I see that in the various routines in multigrid.cpp that the solver type and preconditioner type are hot-swapped between QUDA_MG_INVERTER and QUDA_MG_INVERTER as "is needed," and something must not be getting reset when you do a refresh.

What must be happening is this path in Solver::create is getting called erroneously: https://github.com/lattice/quda/blob/release/1.1.x/lib/solver.cpp#L69

That is, param.preconditioner isn't set when it should be, I'm guessing. I could see this happening because of this line: https://github.com/lattice/quda/blob/release/1.1.x/lib/multigrid.cpp#L612

You're doing just a two level solve, so param.level < param.Nlevel - 2 is false, and maybe specfically because you're mid-update presmoother has also been destroyed... though at this point I'm not sure if that's the case. I can look further later.

If you have the time now---if you could try some printf debugging of casting pointers to size_t (or unsigned long, or w/e), and see if things are null when they should/shouldn't be, that could be a good next step---alternatively/additionally, maybe something I've noted here helps you notice a flaw in the logic.

cpviolator commented 3 years ago

I have a couple of questions:

  1. Are you calling invertQuda directly, or do you create your own QUDA solver in TM_QUDA?
  2. What does set_quda_mg_setup_state(&quda_mg_setup_state, &quda_gauge_state); actually do?

I ask because running

./multigrid_evolve_test --dim 16 16 16 16 --verbosity verbose --mg-verbosity 0 verbose --mg-verbosity 1 verbose --inv-multigrid true --solve-type direct-pc  --mg-setup-maxiter-refresh 0 20 --mg-setup-maxiter-refresh 1 20 >& log.log &

on the release branch seems to run fine, even when I hack it with

 // Update the multigrid operator for new gauge and clover fields                                                                                                            
      if (inv_multigrid) updateMultigridQuda(mg_preconditioner, &mg_param);
      for (int i = 0; i < mg_param.n_level; i++) mg_param.setup_maxiter_refresh[i] = 0;      <--- HACK THIS LINE
      invertQuda(spinorOut, spinorIn, &inv_param);

to try to emulate what you are doing.

kostrzewa commented 3 years ago

Are you calling invertQuda directly, or do you create your own QUDA solver in TM_QUDA?

We call invertQuda directly.

What does set_quda_mg_setup_state(&quda_mg_setup_state, &quda_gauge_state) actually do?

It just operates on state structs that I use to keep track of the state of the QUDA MG setup and the QUDA gauge field. It's the counterpart to check_quda_mg_setup_state in the if condition above which checks these same state variables (point along the trajectory in molecular dynamics units, angles for our twisted boundary conditons, parameters such as kappa, csw and mu that the MG setup was generated with etc and compares the state of the gauge field and the one that the MG setup was generated / last refreshed with). It doesn't touch any of the Quda parameter structs that we employ to control the solvers.

It is called when the MG setup is generated or refreshed, basically baking in that "this MG setup has been generated at this point along the trajectory with these parameters". There's another function which only updates the state w.r.t. the mu value (when updateMultigridQuda is called just to update the sign and/or value of mu).

I ask because running

Indeed. There also don't seem to be any Chroma issues pointing to similar problems when the setup is refreshed when iteration thresholds are exceeded. I'm thus pretty sure that I must be doing something wrong, generating a state in the MG preconditioner which leads to the erroneous behaviour through invalid combinations of parameters passed via the QudaMultigridParam and its internal QudaInvertParam.

You're doing just a two level solve, so param.level < param.Nlevel - 2 is false, and maybe specfically because you're mid-update presmoother has also been destroyed... though at this point I'm not sure if that's the case. I can look further later.

It also happens with three-level solves.

I'm also going to continue digging, thanks a lot. I was hoping that perhaps this had been encountered before by someone.

Our complete QUDA interface is here: https://github.com/etmc/tmLQCD/blob/f9e9c778c271320c8edb61005c7a471a321f53a9/quda_interface.c but I really don't want to bother you folks with this stuff.

I can provide printouts of the various parameter structs which go into the various calls of invertQuda and updateMultigridQuda before and after various combinations of calls, but I don't know how helpful this would be (beyond spamming the issue here)...

cpviolator commented 3 years ago

I think you might be passing the wrong invert param to invertQuda because the GCR solver has QUDA_MG_INVERTER as its preconditioner, which we only set for the mg_param.invert_param = &mg_inv_param; object, i.e., the invert param struct embedded in the mg param struct. The actual QudaInvertParam instance one passes to invertQuda should not have QUDA_MG_INVERTER set as its preconditioner.

Looking again, that is a garbage statement...

kostrzewa commented 3 years ago

While I do copy the inner QudaInvertParam from the outer QudaInvertParam as a shortcut, I tried to make sure to unset things that are unsuitable for the smoothers etc.

I still suspect that this whole copying mg_inv_param = inv_param thing that I do is at the heart of the problem that I see and that I should just create a fresh inner parameter struct each time and populate it accordingly...

cpviolator commented 3 years ago

That sounds like a good check. In QUDA we delineate these things for clarity at the cost of extra lines of code. But because so many different operators and tests use these param setting helper functions, it pays off.

On Thu, Aug 12, 2021 at 12:44 PM Bartosz Kostrzewa @.***> wrote:

While I do copy the inner QudaInvertParam from the outer QudaInvertParam as a shortcut, I tried to make sure to unset things that are unsuitable for the smoothers etc.

I still suspect that this whole copying mg_inv_param = inv_param thing that I do is at the heart of the problem that I see and that I should just create a fresh inner parameter struct each time and populate it accordingly...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lattice/quda/issues/1170#issuecomment-897919040, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR7TV2VHYEKYYDH475WZTDT4QQBHANCNFSM5CAYDU5Q .

kostrzewa commented 3 years ago

Looking again, that is a garbage statement...

I'm thankful that someone else is taking a look at the logic that has built up in there over the years. If you see anything raising alarm bells I'm happy to hear it, even if it turns out to be wrong in the end.

I know that what I do right now works for all cases that I've encountered so far (both for measurements and HMC) except for updateMultigridQuda with the refresh iterations set to a non-zero value.

kostrzewa commented 3 years ago

I've implemented a "clean slate" explicitly populated MG-internal QudaInvertParam (which certainly makes things cleaner). Turns out, this was not the problem....

Because we use a single inv_param to drive all of our solves, I have to switch back and forth between different parameter sets. One of the switches is to inv_param.preconditioner = NULL for non-MG solves and inv_param.perconditioner = quda_mg_preconditioner for MG solves.

While I had re-enabled inv_param.perconditioner = quda_mg_preconditioner for all cases in the setup evolution, I had not done so for the case of refreshing....

Apologies for wasting everyone's time!

kostrzewa commented 3 years ago

It might be useful, however, to check for inv_param.preconditioner == nullptr in GCR when inv_param.inv_type_precondition != QUDA_INVALID_INVERTER.

kostrzewa commented 3 years ago

While we're discussiing updateMultigridQuda, let me use the opportunity to pester you with another question about the state of the thin_update_only version. In our case, when the only thing that changes is mu (and/or tm_rho) in the outer solve, for example, is it necessary to destroy and recreate all the coarse operators and smoothers?

There's a FIXME comment in there which I'd be happy to take care of if the answer to the above question is 'no'.

cpviolator commented 3 years ago

@kostrzewa No time wasted here! The problem was honed down to smaller set of possibilities and solved :)

I think that results of the the scaled mu technique coupled with the fact that MG is so resilient would point to one being able to include mu and tm_rho in the thin updates, but I'm just speculating.

weinbe2 commented 3 years ago

While we're discussiing updateMultigridQuda, let me use the opportunity to pester you with another question about the state of the thin_update_only version. In our case, when the only thing that changes is mu (and/or tm_rho) in the outer solve, for example, is it necessary to destroy and recreate all the coarse operators and smoothers?

There's a FIXME comment in there which I'd be happy to take care of if the answer to the above question is 'no'.

Extending the thin_update_only version should suffice! You'll have to create analogs to setMass (or w/e it's called, typing from my phone) and whatnot if they don't otherwise exist, but I think it'll be straightforward---let me know if you hit any issues.

weinbe2 commented 3 years ago

The one caveat is I think mu gets explicitly baked into the coarse stencil---is it okay that that doesn't get updated?