Update optimisation - Githubissues

Marius1311 commented 3 years ago

@msmdev, how difficult would it be to speed up our optimisation by using derivative information? Is there a prototype implementation somewhere?

msmdev commented 3 years ago

@Marius1311, there is: I used a special Gauss-Newton solver NLSCON in my MATLAB implementation of GPCCA. But: it is tricky and would need a lot of testing to be sure that it works in all relevant cases. Currently, I don't have spare time to invest in this, but you can take a look here yourself:

calls NLSCON: main_nlscon.m
NLSCON solver: nlscon.m
optimization routine with option to use NLSCON via main_nlscon.m: opt_soft.m

Marius1311 commented 3 years ago

Nice! What sort of speedup did you experience with this in Matlab compared to derivative-free? I would like to know this to figure out whether it's worth the effort.

msmdev commented 3 years ago

I only used this to enable clustering into hundreds of macrostates, since Nelder-Mead scales very bad with increasing n: you simply can't get Nelder-Mead to converge in any reasonable time for say n=100. I didn't test for speed differences for small cluster numbers n.

msmdev commented 3 years ago

But beware: Gauss-Newton actually never converged in my experiments, since the routine we are using is non-differentiable. It nevertheless ended somewhere near a local minimum - this at least sufficed to get a nearly optimal solution and identify the optimal number of clusters out of a large interval of potential n. If you really want the optimal (local) solution, you would still need to run Nelder-Mead for the found "optimal" n. In my case the benefit from a final Nelder-Mead optimization was small, so I stayed with the Gauss-Newton solution, but I would assume that this could be very dependent on the actual system you are clustering.

Marius1311 commented 3 years ago

Okay - why wouldn't the Gauss-Newton solver converge? If the routine you're using is non-differentiable, then how did you get derivatives for Gauss-Newton? Isn't the gradient given in (Roblitz and Weber, 2013) on page 163?

Also, how come there are local optima here? Section 4.3.2 "Minimizing the objective function" in (Roblitz and Weber, 2013) starts with "The objective function (16) is convex. Therefore, the optimum is attained in a vertex of the feasible set FA′ (Weber 2006, Lemma 3.7)".

The arguments you outlined above are also given in point 3 of Section 4.3.2, but I don't really understand them, I must admit.

msmdev commented 3 years ago

Hi @Marius1311 I see your problems, but I can guarantee you that you will understand it as soon as you know all the details right :)

Okay - why wouldn't the Gauss-Newton solver converge? If the routine you're using is non-differentiable, then how did you get derivatives for Gauss-Newton? Isn't the gradient given in (Roblitz and Weber, 2013) on page 163?

Wow, what a HUGE formula. XD It is a little more complex here: you can differentiate the objective function, but since you impose constraints after every step, the whole routine isn't. So GN works "stepwise", but doesn't need to converge (it is luck, if it does), but still you get stepwise closer to a local minimum.

Also, how come there are local optima here? Section 4.3.2 "Minimizing the objective function" in (Roblitz and Weber, 2013) starts with "The objective function (16) is convex. Therefore, the optimum is attained in a vertex of the feasible set FA′ (Weber 2006, Lemma 3.7)".

The important thing to understand here is that the objective is convex on the feasible set and NOT on the real numbers as whole. So, if you perform unconstrained optimization on the real numbers, you can (and most likely will) have local minima.

The arguments you outlined above are also given in point 3 of Section 4.3.2, but I don't really understand them, I must admit.

Does the above help to get it?

msmdev commented 3 years ago

@Marius1311, still: damped GN optimization of the objective using the NLSCON is reasonable and very useful. Although it won't "really converge" and might get trapped in local minima - Nelder-Mead can also get trapped in those and it takes ages to converge for large n. Using GN is just tricky and you need to adapt the hyperparameters of NLSCON accordingly. Thus, one needs to do some research on different systems to get a idea how to set them accordingly. I just did this for one system and thus I wouldn't trust those setting generally.

Marius1311 commented 3 years ago

ok, thanks @msmdev, that helps!

Marius1311 commented 3 years ago

Just found this toolbox for python optimisation: https://pypesto.readthedocs.io/en/latest/api_optimize.html

@msmdev, could this be useful for us?

msmdev / pyGPCCA

Update optimisation #23