Open mehrhardt opened 6 years ago
Very reasonable suggestions altogether!
I am not sure which paper by Condat and Vu you are talking about, but in general: yes, the two algorithms are quite similar and we should maybe think about unifying them, or at least have them in the same place.
However, naming seems to be non-obvious to me. Here is a paper (by Condat and Vu among others), that basically indicates/argues that you can see things like PDHG as forward-backward algorithms.
I guess @sbanert also has some opinions in this regard.
What exactly is not forward-backward about this algorithm? It does a forward step (gradient) and a backward step (prox of g^*) in each iteration. Also note that in contrast to the "standard" forward-backward splitting, this variant is an actual primal-dual method since it uses the convex conjugate prox of g.
Overall it looks to me that optimization folks really enjoy saying things like "Method X is just a special case of Y with a specific choice of Z.", and then later someone comes along and says "No, I just generalized X, and now it contains Y as special case!", and then a third person comes and says "I have this totally generic framework that encompasses all popular methods as special cases."
The good thing about people disagreeing on how to call things is that we can call things however we like. On the other hand, we break things by renaming, so IMO we need a better reason.
@aringh, I wasn't talking about a specific paper by both Condat and Vu. They both have done some related work in parallel e.g. https://link.springer.com/article/10.1007/s10957-012-0245-9, https://link.springer.com/article/10.1007/s10444-011-9254-8
@kohr-h, I wasn't saying that one can't call this "forward-backward", just that most people would expect something different in a python file called "forward_backward.py".
Grouping it in the same file as PDHG would not disturb any user but just makes it easier for us. In fact, one may think even of merging it with PDHG as these are (kind of) the same algorithms. While many methods can be interpreted (and analyzed) by a few basic concepts (like proximal point method), their (efficient) implementation is very different. Thus, it makes a lot of sense to have them implemented separately. However, I feel here this is not the case, even their implementations are (almost) the same.
@mehrhardt Okay maybe I read too much into your comment. I don't mind shoving functions around across files, as long as users don't see a difference. Personally I need neither forward_backward.py
nor douglas_rachford.py
nor pdhg.py
, they could all go into convex.py
or stay even one level higher in nonsmooth.py
without further distinction.
Regarding the concrete case here, I'm cautious about renaming if the name isn't plain wrong. What would be necessary to frame the algorithm as PDHG?
Isn't PDHG basically a special case of this if h=0
? So we could, especially with #1365 solved just merge them. E.g. default h
to zero and you can just add that to pdhg
, right @mehrhardt?
The algorithms of Boţ/Hendrich (primal-dual Douglas–Rachford), Condat, Vũ, and Chambolle/Pock are very similar to each other and can all be called primal-dual forward-backward algorithms. They differ by their choice of which parameters they allow, which further operators/functionals they include and if they apply the same kind of algorithm to the primal or the dual problem. I don’t know if there is anything on the market which unifies all these things.
@adler-j, yes, I think we can put this all together and call this pdhg
or something similar. I personally would avoid using the words "forward" and "backward" here but if we are very clear what we mean, then this is still OK.
In this jargon, “forward” is for the gradient step(s), and “backward” is for the proximal step(s). This is derived from forward and backward Euler, since forward and backward steps result from discretising the differential inclusion/equation 0 ∈ ∂f(x) or 0 = ∇f(x) with forward/backward Euler, and the stepsize in the resulting methods corresponds to the time step in the discretisation.
I am well aware of what these words mean in this context. However, as they may cause confusion, I would refrain from doing so.
@mehrhardt Thank's for the links! I will add them to my (quite big) pile of "I-should-have-a-look-at-these"-papers :wink:
@mehrhardt Why do you think the words "forward" and "backward" may cause confusion?
It may cause confusion as this solver is similar to but different from (vanilla) forward-backward splitting as most people understand it.
If we see the need to do so, we can always add "the" vanilla forward-backward splitting method, if only in the interest of disambiguation.
I checked the code, and it yields exactly the vanilla forward-backward algorithm if the lists g
, L
, and sigma
are empty. Wouldn’t it be enough to mention this behaviour in the documentation (and perhaps optimise the code in this important special case)?
Sure, if these are optional this should be fine. We could also just write a small wrapper forward_backward
that does not ask for them.
@sbanert :+1: for digging into this. I'm all in for making a simple forward_backward
that calls the _pd
version with empty lists and optimizing the workhorse for that case.
Out of curiosity: do we have the same behavior for the Douglas-Rachford solver?
The Douglas-Rachford solver does not take a smoooth function as input, so if I have understood it correctly I guess it should not.
The Douglas-Rachford solver does not take a smoooth function as input, so if I have understood it correctly I guess it should not.
I mean, does our DR solver reduce to vanilla DR for some special inputs?
https://github.com/odlgroup/odl/blob/6a18fb0793c8ebe236b7acea835e62fbce20d08d/odl/solvers/nonsmooth/forward_backward.py#L19
I just noticed that the
forward_backward_pd
is pretty much the same algorithm as the Condat-Vu version of PDHG. I personally don't think it should be in a forward-backward file but rather in the same file as PDHG. Moreover, to call this forward-backward is rather non-standard. There are two papers cited forforward_backward_pd
. One which which has 1500 citations and contains the "real" forward-backward solver and not this algorithm and one which is merely 15 citations and does contain this version.Thus, we should have a "real forward-backward" algorithm in ODL as defined by almost everyone (including Boyd). The
forward_backward_pd
still might be useful (do we have a use case???) but should be renamed / merged with pdhg.