quadratic perturbation - Githubissues

odlgroup / odl

Operator Discretization Library https://odlgroup.github.io/odl/

Mozilla Public License 2.0

370 stars 105 forks source link

quadratic perturbation #1064

Closed mehrhardt closed 7 years ago

mehrhardt commented 7 years ago

In my research I frequently make functionals f strongly convex by adding mu/2 |.|^2 to it. Thus, I wrote a few functions that perform this quadratic perturbation for some special cases. However, as the prox of this is fairly easy, one could implement this as a general rule in ODL.

I have seen that the prox of this is already implemented:

def proximal_quadratic_perturbation(prox_factory, a, u=None):

in proximal_operators.py. However, as far as I can see, this is not being used for functionals. Neither do they have the attribute quadratic_perturbation (as they have translation) nor would ODL realize that I have such a construction and use it:

import odl

X = odl.uniform_discr([0,0], [1,1], [10,10])
f = odl.solvers.L1Norm(X)
g = odl.solvers.L2NormSquared(X)

(f + g).proximal(1)

NotImplementedError: no proximal operator implemented for functional FunctionalSum(L1Norm(uniform_discr([0.0, 0.0], [1.0, 1.0], (10, 10))), L2NormSquared(uniform_discr([0.0, 0.0], [1.0, 1.0], (10, 10))))

Moreover, I think it would be useful to have an attribute "strong_convexity" which can be computed based on simple rules such as summation, multiplication and convex conjugation. As ODL has already "grad_lipschitz" it seems inconsistent to have this one but not "strong_convexity".

Related to this, it would also make sense that (linear) operators have the attribute "norm". What is your take on this?

adler-j commented 7 years ago

... quadratic pertubation ...

This should definately be added and as you note this should be very simple to implement. I'd recommend adding it as a free function, e.g.

f = odl.solvers.L1Norm(X)
fpg = odl.solvers.quadratic_pertubation(f, ...)

since this would be the most general and not invasive.

I could do this myself, or you could do it if you'd like.

Moreover, I think it would be useful to have an attribute "strong_convexity" which can be computed based on simple rules such as summation, multiplication and convex conjugation. As ODL has already "grad_lipschitz" it seems inconsistent to have this one but not "strong_convexity".

Overall I'm actually not that happy about the existence of grad_lipschitz, mostly because we barely ever use it and we explcitly said no to a is_convex property since this would be very hard to maintain long term without irritating surprises. Think of stuff like -1 * (-1 * convex_functional) which would basically require a computer algebra system, not to mention stuff with implicit assumptions i.e an optimization problem like

min_x F(x) + G(x)

where

F(x) = x^2
G(x) = - 0.5 x^2

Now clearly, this optimization problem is convex, but the components are not so if we only check for convexity component-wise we're gonna get in peoples faces all the time. And to figure this out we'd need quite some intelligence.

Related to this, it would also make sense that (linear) operators have the attribute "norm". What is your take on this?

Strongly agree, see #1065.

kohr-h commented 7 years ago

Moreover, I think it would be useful to have an attribute "strong_convexity"

In my opinion, this wouldn't add anything useful apart from having a system that raises a red flag whenever you try to do something "bad", i.e., to use a convex method on a (seemingly) non-convex problem.

However, overall we have been going away from this thinking more and more while settling on an API for operators and spaces in ODL. We started out with a quite rigid hierarchy of spaces, e.g., a HilbertSpace that allowed you to take inner products which was not defined otherwise. Turned out that it was just in the way all the time and didn't add anything meaningful, so we flattened that hierarchy completely into LinearSpace. Now existence of something at runtime determines what the space is (this is the (in)famous duck-typing).

The ultimate question when considering a feature should be "What will it be useful for?". For a flag like strong_convexity I can only think of a checker that ensures that only convex problems are fed to the solvers. In many situations, however, a system that tries to be too smart can be very annoying when it tries to keep users from doing "forbidden" things. In particular in situations when the user knows better. What I'm thinking of here is when a user "misuses" an algorithm beyond its intended purpose, but with a good idea of why it would work anyway.

As ODL has already "grad_lipschitz" it seems inconsistent to have this one but not "strong_convexity".

This is correct, and to solve that I suggest that we remove grad_lipschitz as @adler-j seems to imply. (At the time of the original PR I pushed to keep the is_convex flag and similar ones out, but didn't want to push too hard also against grad_lipschitz. After all it was used at a couple of places, although not essential.)

mehrhardt commented 7 years ago

I completely see your point. ODL too often says no, although it is safe to do certain operations. Most annoyingly for me is multiplying an ODL object in a function space with a numpy array when all dimensions match ... but this is a different topic.

However, to me strong_convexity is different (and very different from is_convex). I want to sum functionals, convex conjugate them and so on but don't like keeping track of strong convexity constants. This can be very easily done similar to taking the derivative (which is why we have automatic differentiation nowadays and implemented in ODL) but just a framework is needed. That is why I want to have the strong_convexity property. I would then also define strong convexity for negative constants (semi-convexity) so that the strong convexity of -1 f and thus -1 (-1 * f) is not an issue.

adler-j commented 7 years ago

I completely see your point. ODL too often says no, although it is safe to do certain operations.

And this is a general issue that we're trying to settle. Overal we went for the better say no at first and then make the requirements lighter than the other way around. This way we won't go around breaking everyones code.

Most annoyingly for me is multiplying an ODL object in a function space with a numpy array when all dimensions match ... but this is a different topic.

That should work:

>>> spc = odl.uniform_discr(0, 1, 3)
>>> el = spc.one()
>>> vec = np.random.rand(spc.size)
>>> el * vec
uniform_discr(0.0, 1.0, 3).element([0.27118571993150342, 0.41963078958913824, 0.99180649454248193])
>>> vec * el
uniform_discr(0.0, 1.0, 3).element([0.27118571993150342, 0.41963078958913824, 0.99180649454248193])

However, to me strong_convexity is different (and very different from is_convex). I want to sum functionals, convex conjugate them and so on but don't like keeping track of strong convexity constants. This can be very easily done similar to taking the derivative (which is why we have automatic differentiation nowadays and implemented in ODL) but just a framework is needed. That is why I want to have the strong_convexity property. I would then also define strong convexity for negative constants (semi-convexity) so that the strong convexity of -1 f and thus -1 (-1 * f) is not an issue.

Well there are still a large number of corner cases, even if you go for strong semi-convexity. E.g. x^2 - x^2 is not strongly convex.

Is there any particular place you would want to use the inferred strong convexity?

On the issue

I'll get this done.

mehrhardt commented 7 years ago

Well there are still a large number of corner cases, even if you go for strong semi-convexity. E.g. x^2 - x^2 is not strongly convex.

This is actually an easy one. f(x) := x^2 is 2 strongly convex, g(x) := -x^2 is -2 strongly convex or 2 semi-convex. f + g is 2 + (-2) = 0 strongly convex.

Is there any particular place you would want to use the inferred strong convexity?

I am using this to determine parameters in algorithms, e.g. PDHG. If one wants 1/k^2 or linear convergence one needs to look at the strong convexity of functions. While it is possible to keep track of these things yourself, it is much easier if some software (e.g. ODL) does it for you.

kohr-h commented 7 years ago

I see your point @MatthiasJE. If it's as straightforward to keep track of strong convexity constants through standard operations with functionals AND if there's a clear use case I wouldn't be against having this in Functional. I'll make a new issue to discuss this in detail.