LSMR fails with NaN output for trivial problems - Githubissues

rfeinman / pytorch-minimize

Newton and Quasi-Newton optimization with PyTorch

https://pytorch-minimize.readthedocs.io

MIT License

292 stars 34 forks source link

LSMR fails with NaN output for trivial problems #20

Open tvercaut opened 1 year ago

tvercaut commented 1 year ago

Thanks for the nice library. I wanted to try LSMR but my first attempt to use it with a trivial problem failed with NaN output.

Steps to reproduce:

import torch
import torchmin

A = torch.eye(10)
xtrue = torch.zeros((10, 1))
b = A @ xtrue

x = torchmin.lstsq.lsmr.lsmr(A,b)[0]
print(x)

which resulted in

tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])

instead of 0s

tvercaut commented 1 year ago

Note that the same holds with a slightly less trivial but still trivial case:

import torch
import torchmin

A = torch.eye(10)
xtrue = torch.ones((10, 1))
b = A @ xtrue

x = torchmin.lstsq.lsmr.lsmr(A,b)[0]
print(x)

Comparing the source code with that of scipy points to the following issues.

normr and normar are initialiased differently and do not include the "convergence" tests at the very beginning https://github.com/rfeinman/pytorch-minimize/blob/1017e9732db83fc9ffa2cb6e8316ed2bb0682d6a/torchmin/lstsq/lsmr.py#L148-L149 instead of https://github.com/scipy/scipy/blob/2347d9309fbbabb1d3f89d35b7a42d0d53f002b2/scipy/sparse/linalg/_isolve/lsmr.py#L290-L307

Within the main iteration loop, the convergence test is only done every 10 iterations https://github.com/rfeinman/pytorch-minimize/blob/1017e9732db83fc9ffa2cb6e8316ed2bb0682d6a/torchmin/lstsq/lsmr.py#L257-L259 instead of every time https://github.com/scipy/scipy/blob/2347d9309fbbabb1d3f89d35b7a42d0d53f002b2/scipy/sparse/linalg/_isolve/lsmr.py#L409

tvercaut commented 9 months ago

For the record, I made a few changes here: https://github.com/cai4cai/torchsparsegradutils/blob/main/torchsparsegradutils/utils/lsmr.py