trust-region optimization

quattro commented 2 years ago

Hi all,

Thanks for the awesome library. It would be fantastic to eventually see some options for trust-region optimization. For smaller dimensional problems in my field (e.g., variance components, or their generalization), it provides a much more robust approach for inference compared with 2nd order counterparts.

mblondel commented 2 years ago

Indeed, trust-region methods would be nice to have but I didn't get the 'compared with 2nd order counterparts' comment because in my view trust region is mainly an alternative to line search. For instance, it would be great to have trustncg (Newton-CG + Trust region), which is an alternative to ncg (Newton-CG + Line search).

quattro commented 2 years ago

That's great to hear!

Re my comparison: I'm far from an expert, and my limited understanding was that Newton-CG + line search provides a direction first (Hessian- or approximate Hessian-modified gradient) and then tries to determine the magnitude using line search (or some fixed rate), whereas trust region first limits the magnitude based on the currently accepted trust radius, and then finds an appropriate direction which may or may not coincide with ncg (depending if optimal step is inside the ball). If things are convex then they should ultimately find the same optima, but for instances like variance components, which (I believe) are generally non-convex, trust-based approaches seem to be a bit better behaved.

deasmhumhna commented 1 year ago

This may be dead but through a minor object-oriented modification of OSQP you can support arbitrary convex spaces for which the support function as well as (projection onto) the recession and normal cones can be easily computed, which is most simple convex sets of interest (see Banjac et al. (2019) "Infeasibility Detection in the Alternating Direction Method of Multipliers for Convex Optimization" and this useful blog post). Then the Newton-CG algorithm can just call this general OSQP to solve the subproblem to the desired accuracy. This method can also be used to solve quadratically constrained QPs quite easily as well, with $A_i^T A_i = P_i$, if $P_i$ is readily factorable or using a Chebyshev polynomial approximation of $f(P_i)=P_i^{1/2}$ for general implicit operators. I wrote an ADMM-based QP solver in Golang that I was using as part of several projects last year, though I lost the files around a month ago when my laptop was stolen. But a ConvexSet class shouldn't be too difficult to recreate.

google / jaxopt

trust-region optimization #161