Adding Levenberg-marquardt optimizer in PyTorch

kyrollosyanny commented 2 years ago

🚀 The feature, motivation and pitch

Feature

Adding levenberg-marquardt (LM) algorithm to Torch.Optim.

Motivation

Levenberg-Marquardt (LM), also known as damped least squares, is used to solve non-linear least squares. It converges much faster than gradient descent based methods (orders of magnitude faster). The algorithm needs access to the Jacobian matrix which makes it not suitable for problems with huge data. However, for many applications, like optical design, non-convex optimization, imaging systems design, the size of the network (or model) and data is manageable that LM algorithm provides much better solutions in a fraction of the time (seconds instead of hours). It will be extremely helpful to implement that in torch.optim.

Alternatives

Two alternatives exist

I can try implementing it myself by calculating the Jacobian. However, the functions that PyTorch provides to do that are extremely slow.
Another method that I routinely use is a recently published paper called: "DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition". This paper and GitHub uses PyTorch and provides an approximate way to implement LM algorithm. However, the structure of their optimizer is very different than the familiar torch.optim and the code needs to be changed significantly to use this paper.

cc @vincentqb @jbschlosser @albanD

Chillee commented 2 years ago

I can try implementing it myself by calculating the Jacobian. However, the functions that PyTorch provides to do that are extremely slow.

You should try using functorch's APIs or pass vectorize=True to the jacobian computation.

kyrollosyanny commented 2 years ago

I can try implementing it myself by calculating the Jacobian. However, the functions that PyTorch provides to do that are extremely slow.

You should try using functorch's APIs or pass vectorize=True to the jacobian computation.

Thanks for your suggestion. Yes, I tried using vectorize=True and it is still very slow(minutes for each iteration). While, the DeepLM paper runs orders of magnitude faster (a second or so per iteration). If there is a way in PyTorch to save the Jacobian before computing the Jacobian-vector product, that would be extremely helpful. Also having LM is the most popular method for nonlinear least squares. Adding LM as a standard torch.optim module that would extremely helpful for the community.

pytorch / pytorch