odlgroup / odl

Operator Discretization Library https://odlgroup.github.io/odl/
Mozilla Public License 2.0
372 stars 105 forks source link

Name of "forward_backward_pd" #1399

Open mehrhardt opened 6 years ago

mehrhardt commented 6 years ago

https://github.com/odlgroup/odl/blob/6a18fb0793c8ebe236b7acea835e62fbce20d08d/odl/solvers/nonsmooth/forward_backward.py#L19

I just noticed that the forward_backward_pd is pretty much the same algorithm as the Condat-Vu version of PDHG. I personally don't think it should be in a forward-backward file but rather in the same file as PDHG. Moreover, to call this forward-backward is rather non-standard. There are two papers cited for forward_backward_pd. One which which has 1500 citations and contains the "real" forward-backward solver and not this algorithm and one which is merely 15 citations and does contain this version.

Thus, we should have a "real forward-backward" algorithm in ODL as defined by almost everyone (including Boyd). The forward_backward_pd still might be useful (do we have a use case???) but should be renamed / merged with pdhg.

adler-j commented 6 years ago

Very reasonable suggestions altogether!

aringh commented 6 years ago

I am not sure which paper by Condat and Vu you are talking about, but in general: yes, the two algorithms are quite similar and we should maybe think about unifying them, or at least have them in the same place.

However, naming seems to be non-obvious to me. Here is a paper (by Condat and Vu among others), that basically indicates/argues that you can see things like PDHG as forward-backward algorithms.

adler-j commented 6 years ago

I guess @sbanert also has some opinions in this regard.

kohr-h commented 6 years ago

What exactly is not forward-backward about this algorithm? It does a forward step (gradient) and a backward step (prox of g^*) in each iteration. Also note that in contrast to the "standard" forward-backward splitting, this variant is an actual primal-dual method since it uses the convex conjugate prox of g.

Overall it looks to me that optimization folks really enjoy saying things like "Method X is just a special case of Y with a specific choice of Z.", and then later someone comes along and says "No, I just generalized X, and now it contains Y as special case!", and then a third person comes and says "I have this totally generic framework that encompasses all popular methods as special cases."

The good thing about people disagreeing on how to call things is that we can call things however we like. On the other hand, we break things by renaming, so IMO we need a better reason.

mehrhardt commented 6 years ago

@aringh, I wasn't talking about a specific paper by both Condat and Vu. They both have done some related work in parallel e.g. https://link.springer.com/article/10.1007/s10957-012-0245-9, https://link.springer.com/article/10.1007/s10444-011-9254-8

@kohr-h, I wasn't saying that one can't call this "forward-backward", just that most people would expect something different in a python file called "forward_backward.py".

Grouping it in the same file as PDHG would not disturb any user but just makes it easier for us. In fact, one may think even of merging it with PDHG as these are (kind of) the same algorithms. While many methods can be interpreted (and analyzed) by a few basic concepts (like proximal point method), their (efficient) implementation is very different. Thus, it makes a lot of sense to have them implemented separately. However, I feel here this is not the case, even their implementations are (almost) the same.

kohr-h commented 6 years ago

@mehrhardt Okay maybe I read too much into your comment. I don't mind shoving functions around across files, as long as users don't see a difference. Personally I need neither forward_backward.py nor douglas_rachford.py nor pdhg.py, they could all go into convex.py or stay even one level higher in nonsmooth.py without further distinction.

Regarding the concrete case here, I'm cautious about renaming if the name isn't plain wrong. What would be necessary to frame the algorithm as PDHG?

adler-j commented 6 years ago

Isn't PDHG basically a special case of this if h=0? So we could, especially with #1365 solved just merge them. E.g. default h to zero and you can just add that to pdhg, right @mehrhardt?

sbanert commented 6 years ago

The algorithms of Boţ/Hendrich (primal-dual Douglas–Rachford), Condat, Vũ, and Chambolle/Pock are very similar to each other and can all be called primal-dual forward-backward algorithms. They differ by their choice of which parameters they allow, which further operators/functionals they include and if they apply the same kind of algorithm to the primal or the dual problem. I don’t know if there is anything on the market which unifies all these things.

mehrhardt commented 6 years ago

@adler-j, yes, I think we can put this all together and call this pdhg or something similar. I personally would avoid using the words "forward" and "backward" here but if we are very clear what we mean, then this is still OK.

sbanert commented 6 years ago

In this jargon, “forward” is for the gradient step(s), and “backward” is for the proximal step(s). This is derived from forward and backward Euler, since forward and backward steps result from discretising the differential inclusion/equation 0 ∈ ∂f(x) or 0 = ∇f(x) with forward/backward Euler, and the stepsize in the resulting methods corresponds to the time step in the discretisation.

mehrhardt commented 6 years ago

I am well aware of what these words mean in this context. However, as they may cause confusion, I would refrain from doing so.

aringh commented 6 years ago

@mehrhardt Thank's for the links! I will add them to my (quite big) pile of "I-should-have-a-look-at-these"-papers :wink:

kohr-h commented 6 years ago

@mehrhardt Why do you think the words "forward" and "backward" may cause confusion?

mehrhardt commented 6 years ago

It may cause confusion as this solver is similar to but different from (vanilla) forward-backward splitting as most people understand it.

kohr-h commented 6 years ago

If we see the need to do so, we can always add "the" vanilla forward-backward splitting method, if only in the interest of disambiguation.

sbanert commented 6 years ago

I checked the code, and it yields exactly the vanilla forward-backward algorithm if the lists g, L, and sigma are empty. Wouldn’t it be enough to mention this behaviour in the documentation (and perhaps optimise the code in this important special case)?

mehrhardt commented 6 years ago

Sure, if these are optional this should be fine. We could also just write a small wrapper forward_backward that does not ask for them.

kohr-h commented 6 years ago

@sbanert :+1: for digging into this. I'm all in for making a simple forward_backward that calls the _pd version with empty lists and optimizing the workhorse for that case.

Out of curiosity: do we have the same behavior for the Douglas-Rachford solver?

aringh commented 6 years ago

The Douglas-Rachford solver does not take a smoooth function as input, so if I have understood it correctly I guess it should not.

kohr-h commented 6 years ago

The Douglas-Rachford solver does not take a smoooth function as input, so if I have understood it correctly I guess it should not.

I mean, does our DR solver reduce to vanilla DR for some special inputs?