pkofod / NLSolvers.jl

MIT License
10 stars 0 forks source link

Use rank-1 update blas kernel for quasi-Newton updates #6

Open mohamed82008 opened 4 years ago

mohamed82008 commented 4 years ago

We can get some nice speedup if we use LinearAlgebra.BLAS.ger! to do the rank-1 updates of the approximate Hessian or inverse Hessian in quasi-Newton methods instead of H = H + (w * w') / θ which is much slower for large matrices.

mohamed82008 commented 4 years ago

We can use fallback implementation though for non-BLAS types.

mohamed82008 commented 4 years ago

A bit more advanced would be performing rank-1 updates over the decomposition of H directly, for Direct approximations, to avoid re-factorizing it every time we call \. We have lowrankupdate for Cholesky.

pkofod commented 4 years ago

Yes, thanks for opening this issue. You are touching things that I have planned for, but pushed forward in favor of some other functionality, but yes, those things would be cool.

1) Standard implementation that's closer to the linear algebra/math to work as fallback 2) Fast implementations for Array's of supported types (say Float32 and Float64, whatever is available) with dispatch 3) A factorized version of H that is updated. I originally thought I was going to prioritize this, but then I read some not so great benchmarks and pushed it down the list. But if we already have lowrankupdate (I didn't know that) for Cholesky, that seems like a no-brainer.

We also need L-BFGS and maybe even variants of it. We also need L-SR1 which seems nice is TR contexts (there are even closed forms for the spectral decomposition in the memoryless versions).

mohamed82008 commented 4 years ago

Yes I can perhaps tackle those issues at some point once the package is more stable. One thing is that for small matrices, the BLAS kernel is actually slower than naive implementation, so we need to keep that in mind.

pkofod commented 4 years ago

I think such a check for size should be negligable?

mohamed82008 commented 4 years ago

Yes for all reasonably sized matrices it should be.

pkofod commented 4 years ago

Your objective and gradient calculation would have to be overwhelmingly simple for such a branch to have any effect I'd say. Either way, we can just have it as an option. Auto will check, but specific choices could in principle be inferred if necessary.