Better stopping criteria

lahwaacz commented 7 years ago

As far as I understand, the current stopping criterion used in pyamg is norm(r_k) / norm(r_0) < eps, where eps is specified with the tol parameter of the multigrid solver.

I understand that setting a fixed tolerance this way is useful to assess the efficiency of the solver regarding how fast the residue is reduced, but it does not provide any indication of how good the resulting approximation actually is. Obviously, if norm(r_0) is big, then also norm(r_k) will be big for a fixed eps. In practice, using norm(r_k) / (norm(A) norm(x_k) + norm(b)) or at least norm(r_k) / norm(b) is much better. In some cases estimates involving norm(x_k - x_{k-1}) might be useful as well.

For pyamg, it would be nice if the user did not have to write tol=eps * norm(b) / norm(b - A*x0) or something like that in order to enforce custom criterion. This trick is still insufficient for the last case.

lahwaacz commented 7 years ago

Actually, if I set tol=eps * norm(b) / norm(b - A*x0), how does it affect the coarser levels?

bensworth commented 7 years ago

Using a relative residual is quite typical in iterative methods. Here, in computing r_0 for the relative residual, a zero initial guess is assumed, so ||r_0|| = ||b||. Then, the tolerance is simple scaled by tol *= ||b||. If you scale your initial tolerance tol /= ||b||, then you will get the hard residual tolerance you are looking for. We can also easily add a flag for multilevel.solve(), e.g. relative_residual=False, that determines whether the residual tolerance is scaled by ||r_0|| or not. I think I did this in one of my development branches and just haven't merged it into master. -ben

On 6/27/17 2:16 PM, Jakub Klinkovský wrote:

As far as I understand, the current stopping criterion used in pyamg is |norm(r_k) / norm(r_0) < eps|, where |eps| is specified with the |tol| parameter of the multigrid solver.

I understand that setting a fixed tolerance this way is useful to assess the efficiency of the solver regarding how fast the residue is reduced, but it does not provide any indication of how good the resulting approximation actually is. Obviously, if |norm(r_0)| is big, then also |norm(r_k)| will be big for a fixed |eps|. In practice, using |norm(r_k) / (norm(A) norm(x_k) + norm(b))| or at least |norm(r_k) / norm(b)| is much better. In some cases estimates involving |norm(xk - x{k-1})| might be useful as well.

For pyamg, it would be nice if the user did not have to write |tol=eps

norm(b) / norm(b - A*x0)| or something like that in order to enforce custom criterion. This trick is still insufficient for the last case.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pyamg/pyamg/issues/203, or mute the thread https://github.com/notifications/unsubscribe-auth/AIPhhJ0Mj5f_tCSA1j0nZ4d3w3BfB6-5ks5sIXE0gaJpZM4OHM84.

lahwaacz commented 7 years ago

In my test case for #204, the initial residue actually matches norm(b - A*x0) (not norm(b)) if I run the multigrid iteratively, not as a preconditioner. The scaling by norm(b) instead of norm(r_0) is due to the Rigal-Gâches theorem, so I wouldn't call it a "zero initial guess assumption" :wink: In any case, that's not how pyamg currently behaves.

The hypothetical relative_residual=False parameter would be equivalent to switching to absolute tolerance. I've seen somewhere a condition like norm(r_k) < rel_tol * scaling_factor + abs_tol with user input rel_tol and abs_tol being used, which might help with the problem you mentioned in https://github.com/pyamg/pyamg/issues/204#issuecomment-312694444. The big question, however, is whether scaling_factor should be norm(r_0) or norm(b). I'd prefer the latter.

All this assumes that the tolerance factors are used only at the finest level. However, doesn't the current tol parameter affect also the accuracy of the coarser levels, smoothing or whatever? In that case using a different catch-all parameter(s) might be problematic...

bensworth commented 7 years ago

Running multigrid iteratively, the residual should match norm(b - A*x0). I mean that the stopping tolerance variable is initially scaled by ||b|| if ||b|| != 0. Tolerance should not have any effect on coarser levels, because the coarse levels just do one iteration for a V-cycle, or several for a W-, F-, or K-cycle, but do not solve to any given tolerance. The coarsest level is an "exact" solve, so I suppose that would have an inherent tolerance, but I don't think it is related to the actual tolerance variable right now.

Interesting point on not a zero initial guess assumption. I am somewhat familiar with Rigal-Gaches in the perturbation-theory sense, but not as a stopping criterion for linear systems; can you point me to a reference?

On 7/3/17 11:46 AM, Jakub Klinkovský wrote:

In my test case for #204 https://github.com/pyamg/pyamg/issues/204, the initial residue actually matches |norm(b - A*x0)| (not |norm(b)|) if I run the multigrid iteratively, not as a preconditioner. The scaling by |norm(b)| instead of |norm(r_0)| is due to the Rigal-Gâches theorem, so I wouldn't call it a "zero initial guess assumption" 😉 In any case, that's not how pyamg currently behaves.

The hypothetical |relative_residual=False| parameter would be equivalent to switching to absolute tolerance. I've seen somewhere a condition like |norm(r_k) < rel_tol * scaling_factor + abs_tol| with user input |rel_tol| and |abs_tol| being used, which might help with the problem you mentioned in #204 (comment) https://github.com/pyamg/pyamg/issues/204#issuecomment-312694444. The big question, however, is whether |scaling_factor| should be |norm(r_0)| or |norm(b)|. I'd prefer the latter.

All this assumes that the tolerance factors are used only at the finest level. However, doesn't the current |tol| parameter affect also the accuracy of the coarser levels, smoothing or whatever? In that case using a different catch-all parameter(s) might be problematic...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pyamg/pyamg/issues/203#issuecomment-312712606, or mute the thread https://github.com/notifications/unsubscribe-auth/AIPhhOuoz5ysh5A_KJSNl4qnF1i1PMTUks5sKTcHgaJpZM4OHM84.

lahwaacz commented 7 years ago

Hmm, that's interesting. Why do you scale the tolerance by norm(b) only for the initial check and not for the following iteration checks? For GMRES, the relevant code is here. Also note that the plain multilevel_solver seems to scale by norm(b) for all iterations (see here), so we should first agree on which part we're talking about :laughing:

I don't have a reference at hand for the stopping criterion, but I'll try to find some.

lahwaacz commented 7 years ago

Oh, and for BiCGStab, the initial residual is not preconditioned (see here). Then it scales the tolerance with norm(r_0) like GMRES, but the r_0 is not preconditioned.

Also note that contrary to what the docstring for bicgstab says, the method actually does right preconditioning...

lahwaacz commented 7 years ago

As for the reference for the stopping criterion, see the work of Zdeněk Strakoš, namely C.C. Paige and Z. Strakos, Residual and Backward Error Bounds in Minimum Residual Krylov Subspace Methods, SIAM J. Sci. Comput. 23, 6, 2002, pp. 1899-1924. The preference of norm(r_k) / (norm(A) norm(x_k) + norm(b)) over norm(r_k) / norm(r_0) is mentioned around eq. (5.6), but I believe the whole paper is very interesting.

lukeolson commented 6 years ago

Adding another note to this thread -- we should consider adding a check if the residual is greater than some max_tol, equal to inf or nan.

lukeolson commented 2 years ago

Returning to this old thread as the stopping criteria has popped up again. I have a draft in https://github.com/pyamg/pyamg/blob/stoppingcriteria/pyamg/krylov/_cg.py#L26:

    tol : float
        stopping criteria (see normA)
        ||r_k|| < tol * ||b||, 2-norms
    normA : float
        if provided, then the stopping criteria becomes
        ||r_k|| < tol * (normA * ||x_k|| + ||b||), 2-norms

And then

        if normA is not None:
            rtol = tol * (normA * np.linalg.norm(x) + normb)
        else:
            rtol = tol * normb

So if the user provides normA, for example the Frobenius norm normA=np.linalg.norm(A.data), then the expanded tolerance is used.

From a quick tests of a 1Mx1M matrix, shows that it still outperforms scipy's cg.

$ python _cg.py

Testing CG with 1000000 x 1000000 2D Laplace Matrix
cg took 1341.5021896362305 ms
norm = 0.034460820390114666
info flag = 100

scipy cg took 2644.751787185669 ms
norm = 0.03446082039011625
info flag = 100

lukeolson commented 2 years ago

So the question is whether it's worth having

||r_k|| < tol * (normA * ||x_k|| + ||b||)

or whether

||r_k|| < tol * ||b||

is sufficient

lukeolson commented 2 years ago

https://github.com/pyamg/pyamg/commits/stoppingcriteria and cdace45fb8974d912f0a3b0cc46baf15a9971637 implements both above. In summary:

left-preconditioning, 2-norm of the real residual, and an option to balance with the A-norm are implemented in bicgstab, cg, cgne, cgnr, cr, minimum_residual, and steepest_descent
right-preconditioning, 2-norm of the real residual, and an option to balance with the A-norm is implemented in fgmres
left-preconditioning, the 2-norm of the preconditioned residual, and stopping criteria only on the preconditioned residual is implemented in gmres and it's variants

The main thing being consistency.

lukeolson commented 2 years ago

PR #279 implements the following strategy:

    criteria : string
        Stopping criteria, let r=r_k, x=x_k
        'rr':        ||r||       < tol ||b||
        'rr+':       ||r||       < tol (||b|| + ||A||_F ||x||)
        'MrMr':      ||M r||     < tol ||M b||
        'rMr':       <r, Mr>^1/2 < tol
        if ||b||=0, then set ||b||=1 for these tests.

For gmres only MrMr is used. For fgmres, only rr is used. for the cg variants, all four are available (with bicgstab only rr and rr+).

pyamg / pyamg

Better stopping criteria #203