Closed adler-j closed 8 years ago
To follow up on this, there is also the question of how to classify the methods that find a critical point of a function by simply finding a root of its gradient. An example is Newton's method. Other schemes explicitly go into the direction of smaller function value, so there it makes sense to speak of a minimizing method in the first place. So summarizing:
findroot
@ozanoktem @aringh what's your take on this?
I will have a look at it... part of this separation feels like it is highly 'personal' and the sense that for different persons it will be natural to look for different solvers under different categories.
We shouldn't have to many categories, the current structure is to fine imo. Categories should be mutually incompatible, for example a scalar optimizer vs a equation system solver.
I agree that the number of categories should be kept small, in order to make it easier to find things. However the current categories are not optimal in my opinion; what is the difference between vector
and scalar
? Aren't most schemes iterative
?
Also: the difference between root finders and optimization schemes is not always so sharp. Newton's method can be seen as solving g(x) = 0
, but if f(x)
is a (convex) function and one let g(x) = grad_f(x)
then we are in fact solving an optimization problem with a second order method.
There seems to be no really clear-cut distinction mathematically, so I guess we should choose something which makes sense from an implementation point of view. Not too many categories, I agree, rather a few "axes" along which we distinguish. Let me throw in another suggestion:
A(x) = y
or A(x) = 0
. This typically includes methods which calculate an approximate solution to the normal equation in the linear case or do the same for the linearized problem in each step of a nonlinear problem. This would include
There is one fairly easy way to distinguish between root finders and optimization schemes. If the solver in question takes an optimization problem as input argument, then it is an optimization scheme. If it takes an equation as input argument, then it is a root finder. It might very well be that an optimization scheme makes use of a root finder, and vice versa, a root finder might use an optimization scheme. Now, this requires that ODL has a object-structure for representing equations and/or optimization problems. If that is not feasible, then I would go for a categorization similar to what @kohr-h wrote. Within optimization methods one can also differ between the following:
Thanks for the input, @ozanoktem. Clearly the situation is not as clear-cut as my separation suggests.
Still, I think we have to make the decision based on practicality, meaning that the separation should make sense even for somebody who is not an expert on optimization. It's just a matter of being able to locate a method in reasonable time plus some mathematical backing so it's not completely arbitrary.
Regarding representation of problems, I think that would be a major undertaking. You would have to build some sort of domain specific language to handle independent and dependent variables, constraints, optimality conditions etc. so in a way, we would have to write a clone of PyOpt. Maybe at some point we need a more structured way than we have in mind now (just directly defining a functional and giving it some useful properties), but that would be over the top IMO.
So in that way, I think your suggestions can help differentiate on a file level and finding good names.
I agree with @kohr-h. In some sense it would be nice to be able to just send a problem instance to a solver and then get an answer back, however in my opinion this project is not about building an entire optimization toolbox for all-purpose use. Then we could just pip all the computations to, for example IPOPT, SNOPT, NLopt, or any other library.
Regarding the split into line-search, trust-region/surrogate and hybrid; I'm not convinced that this is the best split. In some sense, this is just algorithmic differences between how the problem is solved. I liked the idea of
Ax = b
As @kohr-h wrote, in order to use the last class of solvers, one really need to do some homework with paper and pen first.
Are we happy with the current structure of the solvers
package? If so, we can close this. Else we should fix it rather asap since it could be breaking.
Are we happy with the current structure of the solvers package?
No.
Else we should fix it rather asap since it could be breaking.
Yes, not so much for the "top" methods, but definitely for the basic ones which are used by other code parts.
So we're back to the old question above of how to classify the methods we have (and what is likely to be implemented soon). Here's a new suggestion: classify according to the problem being solved. What do we have so far?
Mainly Newton's method, but in fact also all quasi-Newton methods. I think we make an artificial distinction here between plain Newton and quasi-Newton methods like BFGS. The former takes an operator (=function) f
and iteratively determines x
such that f(x) = 0
.
The latter take an operator (=gradient of a function) grad_f
and iteratively determines x
such that grad_f(x) = 0
. But as far as I can see, we never use the fact that the provided operator is a function gradient.
So, in summary, here I would put the following methods
Name suggestions: zero
, root
, find...
Many methods do this upon convergence, even if one does not want to go all the way until convergence. Note that the least squares interpretation usually only holds for a linear operator, in the nonlinear case, I don't exactly know what happens, but that I would consider a detail. So this category would encompass
Name suggestions: least_squares
, lsq
Having an own category for this is probably a bit over the top, but it doesn't fit into the others. I guess there are more methods (variants) that could be added here, too.
Name suggestions: gradient
, descent
, grad_descent
This is more or less the current advanced
stuff, including sub-packages. I think they are grouped well as they are, only the name is not great.
Name suggestion: convex
I think the current structure of the solvers
package is far from satisfactory. Before deciding upon this one must first decide upon why we need a structure at all. For whom is it? In what sense is a structured solver package going to make the life easier? Now, one of the proposed papers that I mentioned in my e-mail to the group 2016-08-01 was on a description of the ProxImaL-ODL integration. Natural authors associated with that paper would be @adler-j , @aringh and relevant people from the Boyd group in Stanford, like Steve Diamond. The issue of imposing a structure on the solvers
package, and if so, how it is to be designed, is a very natural part of that paper. I therefore suggest we make a quick and dirty structure along the lines of @kohr-h latest suggestion and postpone any more ambitions work to the aforementioned paper.
@kohr-h, very nice writeup and I agree with your approach. My biggest worry with the naming is least_squares
or similar, since some of the solvers that we would want to place there would perhaps only work for fully determined systems, and hence not be proper least squares solvers. Basically what we want is solvers for A(x) = b
, where A
could possibly be non-linear. I'm not sure what such solvers should be called, perhaps equation_solvers
, equation
or something?
@ozanoktem, I think you are overstating the significance of this. In daily life, most users would simply access all the solvers by odl.solvers.some_solver
and not odl.solvers.some_package.some_solver
, hence restructuring the package internally wouldn't make a difference. This is mostly a programming issue in that we want to have a nice structure of where code should be put.
@kohr-h, very nice writeup and I agree with your approach. My biggest worry with the naming is least_squares or similar, since some of the solvers that we would want to place there would perhaps only work for fully determined systems, and hence not be proper least squares solvers. Basically what we want is solvers for A(x) = b, where A could possibly be non-linear. I'm not sure what such solvers should be called, perhaps equation_solvers, equation or something?
Hm, I see your point. Do you have an example of a method that would be affected? As far as I understand, the current methods all produce a least squares solution upon convergence. The situation you describe would mean that a method would "misbehave" within the least squares category in the sense that it crashes or produces wrong answers for non-invertible cases. One name that came to my mind was "approximate inverse", which is somewhat a relaxed pseudo-inverse and would be slightly broader, perhaps encompassing the case you mention. Unfortunately, the name is taken by a concrete family of methods, which we also may want to implement at some point.
Would we be to strong if we simply named it inverse
?
What about ml_solvers
? Least-squares solutions are in fact special case of maximum likelihood solution when data likelihood is given by Gaussian distribution. This would then cover all the ART, MART, SART, SIRT, ... solvers, as well as the ML-EM solvers.
What about ml_solvers? Least-squares solutions are in fact special case of maximum likelihood solution when data likelihood is given by Gaussian distribution. This would then cover all the ART, MART, SART, SIRT, ... solvers, as well as the ML-EM solvers.
While this is true, I'm not sure that is properly captures the application and similarities in implementation/interface between the solvers. To me, the two (ML vs least squares) imply different ways of thinking.
What about ml_solvers? Least-squares solutions are in fact special case of maximum likelihood solution when data likelihood is given by Gaussian distribution. This would then cover all the ART, MART, SART, SIRT, ... solvers, as well as the ML-EM solvers.
That would be a generalization, and the idea is not bad in principle. But the problem with the solvers with stricter assumptions would not be resolved by this.
Would we be to strong if we simply named it
inverse
?
That's a bit too strong in my view. On the other hand, we're solving an operator equation, so something using that information would also be possible, e.g. op_eq
to be very short, or operator_eq
.
While this is true, I'm not sure that is properly captures the application and similarities in implementation/interface between the solvers. To me, the two (ML vs least squares) imply different ways of thinking.
I'm not sure what @adler-j means by "different ways of thinking". On the other hand, I agree that the naming ml_solvers
does not capture any algorithmic similarities. If that is the goal, then one might perhaps use fixed_point_iterators
, which is an algorithmic stratification.
On the other hand, I agree that the naming ml_solvers does not capture any algorithmic similarities. If that is the goal,
I don't think that's necessarily the goal. I deliberately categorized the methods according to what they solve, not how.
It is true that nothing has been done here for a while. My idea was to try to take care of this when then pull request #498 (functionals) is merged, in order to not create to many conflicts with this branch (this might have been a good or bad move, in any case it was my way to go about it).
During the "workshop" earlier this spring, we came up with a suggestion that was
Reviving this discussion that has been delayed until after the merge of Functional
.
According to my proposal we could distinguish between (names are suggestions)
zero(s)
, root(s)
, findzero
, findroot
, root_finders
, ...)eq_solvers
, equation_solvers
)smooth
, gradient
)nonsmooth
, convex
)plus categories we don't have currently, like heuristic gradient-free methods à la Nelder-Mead or simulated annealing, or ordered subset methods.
Regarding convex
vs. nonsmooth
: Depends on where we want to put nonsmooth non-convex methods. If we want to put them together with the ones we have now, then nonsmooth
is the better name, otherwise convex
will be a better fit.
What would separate a zero finder from a gradient method?
What would separate a zero finder from a gradient method?
How would you classify Newton-like methods as gradient methods? That doesn't make sense.
Well then which ones are the gradient methods? Steepest descent, any others?
Conjugate gradient for example? Although it would also classify as an equation solvers.. Sigh.. Perhaps it really only make sense to distinguish between smooth and nonsmooth on the top level and do the fine grained distinction on the module level.
We really dont need 18 folders, what does it add? I'd be fine with smooth and nonsmooth for now.
Correction, we have three types
min_x f(x)
Okay, I'll restructure this part so it's done once and for all. First though, some of the "deeper" PRs like #587 need to go in.
I'll get this done as part of #587, I've already run into issues with this as part of that.
Alright then, but don't forget to put the moves on separate commits! :-)
Moved from https://github.com/odlgroup/odl-solvers/issues/3
I initialized a structure, but it is obviously far from optimal. We need to find sensible categories for solvers, e.g.
We need to reflect these categories in classes in a sensible manner.