Recent technical improvements to scipy.optimize

mdhaber commented 6 years ago

Since there have been many recent contributors to scipy.optimize, I think a distributed approach to drafting this section would work best. Once we collect components, those who are interested can work to edit this section to make it fit with the rest of the paper. Of course, contributors would understand that their components may need to be edited and might not even make the final paper due to space constraints, but that we'd really appreciate their input in any case.

I've quickly reviewed closed PRs labeled scipy.optimize from the past few years, looking mainly at those prefixed BUG, ENH, or WIP, and picked out a few that seem appropriate for the paper. Please forgive (and point out!) any omissions or incorrect attributions.

@nmayorov #5556, #5147 @felixlen #7292 @larsmans #5158 @surhudm (or @nmayorov) #6493 @andyfaff #4191, #3446, #5605 @pv #4392 (and potentially #6769 / #5536 if you think they're significant) @person142 #5557 and of course @antonior92 #6919, #7165 myself, #7123

Here is a draft about the new linprog interior-point method. We might decide we want more or less about different topics - or even that we want something pretty different from what I have below - but I thought it would be helpful to give something as an example.

_A new interior-point optimizer for continuous linear programming problems, linprog with method='interior-point', was released with SciPy 1.0. Implementing the core algorithm of the commercial solver MOSEK [1], it solves all of the 90+ Netlib LP benchmark problems tested [2]. (Add something about speed benchmarks?) Unlike some interior point methods, this homogenous self-dual formulation provides certificates of infeasibility or unboundedness as appropriate. (Add something about infeasible/unboundedness benchmarks?)

A presolve routine based on [3] solves trivial problems and otherwise performs problem simplifications, such as bound tightening and removal of fixed variables, and one of several routines for eliminating redundant equality constraints is automatically chosen to reduce the chance of numerical difficulties caused by singular matrices. Although the main solver implementation is pure Python, end-to-end sparse matrix support and heavy use of SciPy's compiled linear system solvers --- often for the same system with multiple right hand sides due to the predictor-corrector approach --- provide speed sufficient for problems with tens of thousands of variables and constraints.

Compared to the previously implemented simplex method, the new interior-point method is faster for all but the smallest problems, and is suitable for solving medium- and large-sized problems on which the existing simplex implementation fails. However, the interior point method typically returns a solution near the center of an optimal face, yet basic solutions are often preferred for sensitivity analysis and for use in mixed integer programming algorithms. This motivates the need for a crossover routine or a new implementation of the simplex method for sparse problems in a future release, either of which would require an improved sparse linear system solver with efficient support for rank-one updates._

[1] J. Dongarra and E. Grosse, “Netlib," [Online]. Available: http://www.netlib.org/. [Accessed 20 February 2018]. [2] Erling D. Andersen and Knud D. Andersen. “The MOSEK interior point optimizer for linear programming: an implementation of the homogeneous algorithm.” High performance optimization. Springer US, 2000. 197-232. [3] Erling D. Andersen and Knud D. Andersen. “Presolving in linear programming.” Mathematical Programming 71.2 (1995): 221-245.

Comments about this approach, the PR list, and the draft above are welcome.

felixlen commented 6 years ago

I think this is a great idea and I like your text proposal. Regarding #6919 and #7292 I think it is best to have a unified description as these provide two methods for the same problem unless @antonior92 has objections. I can create a draft if this is of interest.

mdhaber commented 6 years ago

@felixlen Thanks for the feedback! (I'll let @antonior92 reply about the rest.)

surhudm commented 6 years ago

@mdhaber. Here is some text related to #6493 (@nmayorov please take a look to see if I missed something or if this is too technical)

Covariant errors are commonplace in scientific data used to fit parametric models. With SciPy 1.0, the curve_fit routine in scipy.optimize can now accept data with covariant errors and optimize model parameters. The optimization that curve_fit performs depends upon the optional input parameter sigma, which has now been upgraded to be polymorphic. If we define residuals as r = ydata - f(xdata, *popt), a 1-d sigma will optimize chisq = (r / sigma).T @ (r/sigma), a 2-d sigma will optimize chisq = r.T @ inv(sigma) @ r, the default None option is equivalent to a 1-d sigma filled with ones. The problem with covariant errors is mapped back to the 1-d problem by performing a lower triangular decomposition of sigma=L L.T, such that the optimized function is chisq=r'.T @ r' with L r'=r. The optional argument jac also has been updated and gets scaled internally in case a 2-d sigma is passed corresponding to the covariant errors.

antonior92 commented 6 years ago

Hi all, I agree with @felixlen that we should try to write a unified description regarding #69191 and #7292. Both because, I think it is important for us to highlight the similarities and diferences between the methods and because we should try to keep everything as brief as possible.

To be completely sincere, I think to explain any of the methods in minimize in detail is beyond the scope of the paper. Nevertheless I think we should try draw a clear picture of what is the current state of minimize, and I think we should include a clear comparison between all the solvers. Some questions that I think should be answered for all solvers:

Does the solver uses first derivatives?
Does the solver uses second derivatives?
What are the local convergence properties (e.g linear, superlinear, quadratic convergence)
What are the global convergence properties (i.e. does the solver is guaranteed to converge to a stationary point)
Does the solver uses direct matrix factorization
Does the solver uses krilov methods instead of factoring matrix.
Does the solver uses a trust-region or line-search strategy.
If it uses a trust-region strategy (as many from scipy) how accurate is the solution of the trust region problem. Maybe we could rank dogleg, trust-ncg, trust-exact and trust-krylov.
If it uses a line-search strategy, what guarantees it gives (wolf conditions?)
How it does perform on large-scale benchmark problems (probably relate this to 5 and 6)
How it perform on classical benchmarks (e.g. rosenbrock...)
What are the references for this solver.
When it was added to SciPy.

While I believe we should focus on the new solvers, my general opinion is that we should try to answer the basic questions above for all solvers and try to give some notion of what is the current state of optimize.minimize environment. One or two tables with this comparison and some plots containing benchmark results may serve well for this purpose.

mdhaber commented 6 years ago

@antonior92 It would be great to collect all that information together. I hope there is space for it in the paper, but I'm not sure if there will be, as it sounded like the paper is supposed to focus on the more recent improvement . But a table like that would at least find a place in the documentation!

mdhaber commented 6 years ago

@surhudm Thanks! That's very helpful. I think that the middle sentence "If we define residuals as..." is a little more technical than what I had in mind, but we'll see what other editors think. I think that the rest might be OK but could you replace the code with $\LaTeX$ ?

mdhaber commented 6 years ago

@antonior92 How do you want to go about collecting this information? Perhaps take a look at who submitted the PRs and try to contact them? Do you have some information to start the table?

rgommers commented 6 years ago

Agreed that such a table would be extremely useful, even if it only ends up in the docs and doesn't make the final cut of the paper (but it could - such comparisons as a table don't take that much space).

antonior92 commented 6 years ago

I think the best way to do it is by taking a look at the original papers. I can do it.

I will draw a first draft of the table and if needed I can contact the authors in order to fill some gaps. I will try to do it this week.

I think the table is a nice way of presenting a general overview of the current status of the nonlinear solvers in scipy.

Maybe we can include just a reduced version of the table in the paper if it becomes too large.

mdhaber commented 6 years ago

Hey @antonior92 have you started on this? If not, maybe I can help somehow, like by collecting some of the papers for you and filling in what I can?

antonior92 commented 6 years ago

Hi @mdhaber, I just wrote a first draft of the table+text for minimize function:

minimize-table.pdf

I wrote a full paragraph about trust-exact and another about trust-krylov (which are the methods added more recently). But I tried to give a general overview of all methods in comparative terms.

@felixlen can you help me fiding out the local convergence rate of trust-krylov and if there are guarantees of global convergence for it? I was not able to find this information by myself for this method. For the other unconstrained minimisation methods Nocedal book or original papers usually contained this kind of information.

@rgommers I added a table entry about the version each optimization method was added to scipy. I was not able to find out when some of the early methods were added, so I just wrote <0.7. Where can I look for this information?

Finally, I would like to get some feedback on this material. I wrote a .tex containing the table and the text: Should I create a separate git repo or maybe share it in overleaf? Or maybe it is better to just share the pdf and get suggestions, with only me editing the .tex for now.

mdhaber commented 6 years ago

This is terrific, @antonior92. I have several comments (for instance typos on "Nelder-Mead", "Powell", "iterative") that I could easily fix on Overleaf or otherwise. Do we need a separate repo? I think this could just be added to the main document in this repo - maybe linked from a separate .tex file using \input, but still included in the main document "paper.tex". I'd also like to add an intro to scipy.optimize and the bits I've collected from everyone so that this starts to look like a real section of the paper. Then again I'm not really sure how this is all intended to work; I've not used git for collaborative LaTeX document editing before.

mdhaber commented 6 years ago

@antonior92 I just opened #16 to illustrate what I meant by \inputing a file into the main document. I was suggesting that you add your LaTeX into this new file, scipy-optimize.tex, which is \input in the main document paper.tex. When one of us edits scipy-optimize.tex he submits a PR and the other can review the changes and merge (when I have commit rights, anyway). Is that how it's supposed to go or is that going to be too many PRs? Should we just push directly to scipy-articles and others can just take a look at the commits afterwards, with no chance for review?

antonior92 commented 6 years ago

Thank you for the example Matt!

I will do what you are suggesting. I will probably have to add additional references to .bib and add some latex usepackages...

mdhaber commented 6 years ago

@rgommers and others: a few questions.

1) I wrote my blurb about linprog interior-point almost like an advertisement of the new capabilities. @antonior92 drafted an excellent summary of minimize. So the styles are quite different. Should I write a summary of linprog more like what @antonior92 has about minimize? Should @antonior92 expand on trust-krylov in the style that I used for interior-point?

2) Also, are we strictly talking about 1.0, or is code up through 1.1 fair game? For instance, can I report interior-point's performance after a bug fix, or do I have to benchmark interior-point as released in 1.0? Likewise, let's say @antonior92's constrained optimization method(s) will be added in 1.1; can it(/they) be featured? (I don't know the status, so if there is no chance of this happening then never mind.) Or, if we are not supposed to write about 1.1 (or hopefully-soon-to-be-released functionality) here, where does it belong? I would like to include a bit about where I think linprog is headed, if there is space, and whether @antonior92's new methods are ready for the release of the paper or not, I certainly think they deserve to be mentioned. Personally, I think they belong in the table he drafted with "Version added" as ">1.0"

rgommers commented 6 years ago

Should I write a summary of linprog more like what @antonior92 has about minimize? Should @antonior92 expand on trust-krylov in the style that I used for interior-point?

I'd say that the text you have now in the description reads well; it's not too much of an "advertisement" for a scientific paper I'd say.

Also, are we strictly talking about 1.0, or is code up through 1.1 fair game?

I'd say 1.0 only; we have to draw a line somewhere.

here does it belong?

A new paper I'd say, e.g. for 1.2 (maybe a short JOSS paper) or a paper specifically on these methods.

mdhaber commented 6 years ago

@rgommers I agree that the text I had for linprog (now added to #16) reads fine; but it is still quite different from #17. I guess we'll Frankenstein them together by merging #16 and #17 then we can address the differences.

surhudm commented 6 years ago

@mdhaber I can write the things in latex, once you decide on whether to include the text or not. I just held off on converting the text until you figure out the section organization.

tylerjereddy commented 5 years ago

Can we close this? The main content has been merged, no?

It is now a matter of refactoring the optimize stuff a little, which I think is continued in #113?

scipy / scipy-articles

Recent technical improvements to scipy.optimize #13