stopping iterative methods early

mehrhardt commented 6 years ago

Currently, the iterative methods do not have the option to terminate early, e.g. when the iterates do not change any more. Perhaps one should have something like "callback" that checks a termination criterion. Are there other opinions?

kohr-h commented 6 years ago

By iterative methods, do you mean Landweber and Kaczmarz type methods or also optimizers? In general I agree, this would be nice to have, but doing it consistently would require introducing some kind of memory in the solvers that is not needed for the iteration itself. It would be a quite large change, related also to the super-old issue #76.

For this particular case, since only the iterates are needed, you can write a (pretty hacky) callback

class CallbackStopIfNoChange(Callback):
    def __init__(self, atol=1e-10, dist=2, print_dist=False):
        self.atol = atol
        self.previous = None
        self.print_dist = print_dist

        if np.isscalar(dist):
            def distp(x1, x2):
                return np.linalg.norm(x1 - x2, ord=dist)

            self.dist_func = distp

        elif callable(dist):
            self.dist_func = dist

        else:
            raise TypeError('error message goes here')

    def __call__(self, iterate):
        if self.previous is None:
            self.previous = iterate.copy()
            return
        else:
            dist = self.dist_func(iterate, self.previous)
            if self.print_dist:
                print('dist(x_k, x_(k-1)) = {}'.format(dist))
            if np.isclose(dist, 0, atol=self.atol):
                raise StopIteration
            else:
                self.previous = iterate.copy()

It won't terminate in a pretty way since it exits with a StopIteration exception, but you can catch that outside. You can also take a different exception type.

mehrhardt commented 6 years ago

By iterative method I meant actually any kind of iteration: Landweber, CG, PDHG, anything.

I do like the idea of having a callback that handles it as you proposed. This would almost solve it but terminating it by raising an exception is rather crude. One could catch that exception outside but then one needs a hack every time one wants to do early stopping. Actually, stopping the solver "early" should not be any different than to iterate the fixed number of iterations. Any ideas how to achieve that?

kohr-h commented 6 years ago

It seems to me that this is not achievable without a change of API of the solvers, in the sense that it takes a StoppingCriterion object or so, similar to the LineSearch in smooth methods. The general working of this class could be that it stores as much stuff as it needs (just as the callback above), and then returns True (for terminate) or False (for don't terminate), and the solver calls it in each iteration.

mehrhardt commented 6 years ago

Yes, I think this would make a lot of sense. The only problem is that it may "clutter" the algorithms but I think early termination is too important for it to be left out.

kohr-h commented 6 years ago

Sure, but cluttering-wise we can always stuff things into kwargs to avoid too long signatures. This would be an advanced feature, so it doesn't need to be obvious. Of course, internally the cluttering is relevant.

Now that I think about it, maybe we can "standardize" this interface in the sense that callbacks are allowed to raise StopIteration, and each solver that calls a callback takes care of catching that, and return gracefully in cast it is raised. So we wouldn't need a new class for this, just a small internal change of the callback handling.

Maybe some more opinions on this, @adler-j, @ozanoktem?

adler-j commented 6 years ago

We had some really exciting discussions with some people at Torsten Hohages group @ Göttingen, and the proposal discussed there was:

Revive our old friend #76 "Making solvers class instances". Illustrating with landweber, the design would look like this.

For reference, the currrent interface in shortened form:

def landweber(op, x, rhs, niter, omega=1, callback=None):
    for i in range(niter):
        x.assign(x - omega * op.adjoint(op(x) - rhs))
        callback(x)

The new interface would involve a solver class and a "default implementation", e.g.

class Landweber(object):  # possibly create a solver class
    def __init___(self, op, x, rhs, omega=1):
        self.op = op
        self.x = x
        self.rhs = rhs
        self.omega = omega

    def update(self):
        self.x.assign(self.x - self.omega * self.op.adjoint(self.op(self.x) - self.rhs))

    def current(self):
        return self.x

def landweber(op, x, rhs, niter, omega=1, callback=None):
    solver = Landweber(op, x, rhs, omega)

    for i in range(niter):
        solver.update()
        callback(solver.current())

With an interface of this form, we would both allow the current version with basically 0 overhead for the users, while we would also allow an "advanced usage", where the user can control each iterate one-by-one. By doing this, we could simultaneously solve a range of issues, e.g. this issue (the user can do any stopping criteria they want) as well as e.g. #571 "Callback for primal-dual gap and others".

How would you feel about this? The details are obviously still up for discussion.

kohr-h commented 6 years ago

Looks great in general. But how does this solve the primal-dual gap problem? I mean, obviously, primal-dual methods could expose both as properties and the primal-dual gap callback could pick them up. But that results in an ad-hoc API for each new solver.

So here are questions I'd like to discuss:

Can we think of a way to unify "the things a solver exposes", something like a workspace property?
How about solvers with different "types" of steps, like primal step, dual step, overrelaxation step etc., do we want to separate those? We could have a standard interface with a step (or update as above) method that calls into the substeps step_primal, step_dual etc. This is particularly interesting for methods where you need to find out how to balance the different steps, for instance when you do alternating minimization, but the two minimization processes run at different speed so you want to do 10 steps of the first and 1 of the second. Stuff like this. Of course, this wouldn't need to be exposed in the functional interface, but advanced users could use the class directly.
Then there's also the issue with callbacks (or step size methods) that need the current and previous iterates. How are we gonna do that? Can we somehow tell the solver what to store?

aringh commented 6 years ago

Can we think of a way to unify "the things a solver exposes", something like a workspace property?

Except for update and current (which should be easy access), I also think that collecting the rest of the things exposed by a solver in some kind of property would be good. However, unfortunately I think that the list of what one would like to expose could be made very long, and also depend on the solver in question (or on the structure of the solver). It could be primal-dual gap, infeasibility, objective function value, the gradient, current step length values, etc. From some prescriptive we are not aiming for a full general-purpose optimization package, so I guess we have to put boundaries.

How about solvers with different "types" of steps [...]

The different solvers currently have different interfaces since the solve problems of different form, e.g., PDHG solves min f(x) + g(Lx) while in its general form the PD forward-backward solver solves min f(x) + sum_i (g_i @ l_i)(L_i x) + h(x). Is there a way to unify this?

Then there's also the issue with callbacks [...]

If you make solvers a class, you could in the advanced options have things to indicate that it should save X old iterates, Y old gradients, etc. I am not sure of how to use it in the callback, but some of the bookkeeping could be done in the solver class instance.

adler-j commented 6 years ago

Can we think of a way to unify "the things a solver exposes", something like a workspace property?

My suggestion here is that we simply go by standardization (just as we do with e.g. the parameters currently). One could for example standardize that current_dual() returns the current dual iterate.

If we did it like that, we could simply do something like

solver = PDHG(...)

for i in range(niter):
    solver.update()
    primal = solver.current()
    dual = solver.current_dual()
    primal_dual_gap = F(primal) - F.convex_conjugate(dual)
    print(f'gap={primal_dual_gap}')

How about solvers with different "types" of steps, like primal step, dual step, overrelaxation step etc., do we want to separate those?

For me, that would be a feasible "next step" in an implementation like this. Initially, I say go for only the simple "update" etc, and then perhaps later this can be added if someone feels that it is needed.

Then there's also the issue with callbacks (or step size methods) that need the current and previous iterates. How are we gonna do that? Can we somehow tell the solver what to store?

If the user wants to do something advanced like that, perhaps using the advanced interface is all fine, e.g.

solver = Landweber(op, x, rhs, omega)

all_iterates = []

for i in range(niter):
    solver.update()

    all_iterates.append(solver.current(copy=True))
    if should_stop(all_iterates):
        break

kohr-h commented 6 years ago

@aringh

Can we think of a way to unify "the things a solver exposes", something like a workspace property?

Except for update and current (which should be easy access), I also think that collecting the rest of the things exposed by a solver in some kind of property would be good. However, unfortunately I think that the list of what one would like to expose could be made very long, and also depend on the solver in question (or on the structure of the solver). It could be primal-dual gap, infeasibility, objective function value, the gradient, current step length values, etc. From some prescriptive we are not aiming for a full general-purpose optimization package, so I guess we have to put boundaries.

Oh, I didn't mean to go this far. My question was really restricted to the things the solver anyway computes. Using that stuff would really be up to the calling code.

How about solvers with different "types" of steps [...]

The different solvers currently have different interfaces since the solve problems of different form, e.g., PDHG solves min f(x) + g(Lx) while in its general form the PD forward-backward solver solves min f(x) + sum_i (g_i @ l_i)(L_i x) + h(x). Is there a way to unify this?

Sure, but only on a high level, namely exposing a step method that does one step of the optimizer. Below that, things are so much different that it wouldn't make much sense to try to unify them. But for advanced use cases, it may still be desirable to expose those steps, so that the optimizer becomes less of a black box.

@adler-j That really looks like the simplest and most flexible approach. And :+1: for using f-strings :wink:

mehrhardt commented 6 years ago

I am not sure that this change in API makes a big difference. It means that if one wants to have more specialised callbacks, then one can recycle some code by using the update function. However, in some applications one also might want to choose which values to compute and which ones to store. Thus, for these one needs to change the update function. Thus, I am not convinced one gains that much with the proposed changes.

kohr-h commented 6 years ago

I am not sure that this change in API makes a big difference. It means that if one wants to have more specialised callbacks, then one can recycle some code by using the update function. However, in some applications one also might want to choose which values to compute and which ones to store. Thus, for these one needs to change the update function. Thus, I am not convinced one gains that much with the proposed changes.

If you want to write callbacks that do more interesting stuff, then yes, agreed. But the whole point of writing a class that exposes elements of the optimization method is to not have to use callbacks to do interesting stuff. See, the purpose of callbacks is basically to get information out of a black box. You don't know if or when things will happen, but you say to the black box "Whenever you do X, please call this function." But if you have a "grey box" rather than a black box, you have all kinds of possibilities, depending on how much the box exposes.

So in our example, think of something like this:

Basic version

class Landweber(object):
    def __init__(self, op, x, rhs, omega=1):
        self.op = op
        self.x = x
        self.rhs = rhs
        self.omega = omega

    def step(self):
        self.x.assign(self.x - self.omega * self.op.adjoint(self.op(self.x) - self.rhs))

    def current(self):
        return self.x

More advanced version

class Landweber(object):
    def __init__(self, op, x, rhs, omega=1):
        self.op = op
        self.x = x
        self.rhs = rhs
        self.omega = omega

    def step(self):
        self.x.assign(self.current() - self.omega * self.current_l2sq_grad())

    def current_l2sq_grad(self):
        return self.op.adjoint(self.current_residual())

    def current_residual(self):
        return self.op(self.x) - self.rhs

    def current(self):
        return self.x

With intermediate storage

class Landweber(object):
    def __init__(self, op, x, rhs, omega=1, use_tmp_dom=True, use_tmp_ran=True):
        self.op = op
        self.x = x
        self.rhs = rhs
        self.omega = omega
        self.tmp_dom = self.op.domain.element() if use_tmp_dom else None
        self.tmp_ran = self.op.range.element() if use_tmp_ran else None

    def step(self):
        self.x.lincomb(1, self.x, -self.omega, self.current_l2sq_grad())

    def current_l2sq_grad(self):
        return self.op.adjoint(self.current_residual(), out=self.tmp_dom)

    def current_residual(self):
        tmp = self.op(self.x, out=self.tmp_ran)
        tmp -= self.rhs
        return tmp

    def current(self):
        return self.x

This now gives you all kinds of possibilities to alter the iteration, inspect intermediate results etc. without having to resort to callbacks.

Also note that these changes can be added gradually without changing the interface (step).

mehrhardt commented 6 years ago

Fair enough. One can also then inherit from Landweber the main structure and change some of the implementations if needed.

kohr-h commented 6 years ago

Fair enough. One can also then inherit from Landweber the main structure and change some of the implementations if needed.

Exactly!

odlgroup / odl

stopping iterative methods early #1252