Closed NJdevPro closed 2 years ago
If you are working on GPU,
torch.inverse
for singular matrix is a known issue https://github.com/pytorch/pytorch/issues/46557, and was fixed in https://github.com/pytorch/pytorch/pull/46625. You may use a recent nightly build to get the correct torch.inverse
behavior.
Thank you for reporting this issue, @NJdevPro, we'll take a look soon!
There are common issues with most PyTorch routines which are inherited from LAPACK/MAGMA - not all of them handle rank-deficient cases.
I think if the user intentionally uses a LAPACK routine and is aware of its limitations, it is fine to just return a LAPACK error. Sufficient for torch.inverse
, not necessarily for torch.lu_solve
.
If we are to introduce a flag to handle rank-deficient cases, then we need to support solve
for general matrices.
The algorithm could be something like this:
We want to solve Ax = b.
1. find r = rank(A)
2. find r linearly independent rows at indices r_rows and r linearly independent columns at indices r_cols.
Could be done via the QR factorization (with pivoting, currently not in PyTorch) run on A and A^T.
3. solve A[r_rows, r_cols] y = b[r_rows, :] # can still fail if certain rows/columns are almost linearly-dependent.
4. set x[r_cols] = y, x[ [n] / r_cols] = 0.
5. test whether A x == b, otherwise report the system does not have a solution (b is not in range(A)).
6. the user should be aware that the solution is not always unique as any vector z = x + d with Ad = 0 is also a solution.
The algorithm does require synchronization with the CPU to read the rank if the input tensors are CUDA tensors. In fact, any rank-deficient algorithm that handles degeneracies will suffer from this issue.
@mruberry , what do you think? Shall we extend the functionality to rank-deficient cases with an additional flag? We will only need the tensor of ranks to always live on CPU. I think this price is worth paying given that ML is going in the direction of larger but sparser/lower dimensionality models...
Thanks for the explanation @nikitaved. So the options seem to be:
(1) isn't very user-friendly, (2) is going to make some people unhappy because it costs performance, the problem with (3) is that it's hard to predict when rank deficiency will occur (it may depend on values in the input to a model).
Thanks for the write-up, @nikitaved. I think we should start with (1) since that can happen now, and we should try to accurately document the state of PyTorch.
One thing I don't understand yet is how our planned future is different from NumPy and SciPy. In the future we'll have
None of these three functions exist in PyTorch yet. @nikitaved, should we solve this by implementing those functions to be consistent with the mathematics of their NumPy and SciPy counterparts instead of trying to update the behavior of torch.inverse and torch.lu_solve?
None of these three functions exist in PyTorch yet
The docs for torch.linalg.solve
in PR gh-48456 do contain a note about behaviour for singular input, will throw a RuntimeError
.
The docs for torch.linalg.inv
in PR gh-48261 mention a similar thing (RuntimeError
) for non-invertible matrices. Cc @IvanYashchuk.
I'll note that both NumPy and SciPy raise a custom error (LinalgError
), which may be more useful than RuntimeError
because it's easier to catch and more descriptive.
should we solve this by implementing those functions to be consistent with the mathematics of their NumPy and SciPy counterparts instead of trying to update the behavior of torch.inverse and torch.lu_solve?
It seems to me that torch.inverse
and torch.lu_solve
should still either be fixed or deprecated?
Thanks @rgommers, that's helpful. I agree we should deprecate torch.inverse when we have torch.linalg.inv, and deprecate torch.lu_solve when we implement torch.linalg.lu_solve.
cc @heitorschueroff who's going to review linalg deprecations for 1.8 and 1.9.
EDIT: see edit below
Has this been resolved? In the docs for torch.inverse, it says "Alias for torch.linalg.inv()" https://pytorch.org/docs/stable/generated/torch.inverse.html
I just ran into this problem when trying to fit linear regression directly by computing the inverse. It was amazing, because for one dataset it gave results consistent with using numpy's linalg library, but for a subset of the same dataset (90+% of it) the results were wildly different. I had to go on a goose chase to track down where the differences were occurring and figured out that they were in the inversion step.
Numpy
Torch 1.10
This is pretty dangerous if you don't know about it.
EDIT: The difference between numpy and torch implementation actually has to do with float32 vs float64 precision. When casting the tensors/arrays to float64 prior to making the computation, the numpy and torch linear algebra operations agree.
A few points are in order:
inverse
and lu_solve
just work with invertible matrices. If the matrix is close to singular, they may return the wrong results. You may want to look at linalg.lstsq
to do that in the singular case, although it just works on CPU for singular inputs, so you'd need to cast your inputs to cpu and back to use it if you want to use it on GPU.linalg.inv
linalg.lu_factor
will result in a similar error as that of linalg.inv
. Now, as always, if you use close-to-singular matrices on float32 precision you are in for a rough time.I will add linalg.lu_solve
soon™, where I'll clean the API and so on, but this one will have the same caveats as torch.lu_solve
when it comes to close-to-singular inputs. Even more, this expects as input the output of linalg.lu_factor
, so the check for singular inputs will be made in linalg.lu_factor
, and linalg.lu_solve
will assume that the given factored matrix is sufficiently well-conditioned.
Alas, I don't think that there's much we can do to address this issue as it boils down to the usual caveats of using floating point numbers. I'd say we close it.
Closing for now.
🐛 Bug
torch.lu_solve and torch.inverse give (completely wrong) results for singular matrices instead of returning an error.
To Reproduce
Expected behavior
For singular/non-invertible matrices, lu_solve and inverse should return an error/exception instead of silently giving completely incorrect results. This can lead to dangerous calculations if the user doesn't have the mathematical background to realize that the result is simply false. Raising an error is the behavior of numpy.linalg.solve, btw, so for consistency, Torch should do the same.
edit: see also https://github.com/pytorch/pytorch/issues/31546 https://github.com/pytorch/pytorch/issues/16076
Environment
conda
,pip
, source): pipAdditional context
cc @ezyang @gchanan @zou3519 @bdhirsh @vishwakftw @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr