Feature request: Moore-Penrose pseudoinverse

mars0i commented 7 years ago

At present my own need for a Moore-Penrose pseudoinverse is only that it would allow me to experiment with it while studying math that uses it. In the long run I expect that there will be people (maybe me) who need it for a serious use case.

I have seen that there are various efficient methods to compute this pseudoinverse, but don't think I'm likely to be up to the challenge of figuring out the best way to do it. I thought it would be worth documenting the need for an M-P pseudoinverse for future work, even though my own need at this point is very limited.

mikera commented 7 years ago

I think this is a good idea. Probably best if we start with defining the right API, different implementations may have different ways to calculate the pseudoinverse.

I believe that it can be done using SVD, so we could make a default implementation that depends on SVD, for implementations that support it. See: https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse

mars0i commented 7 years ago

The API would just be m x n matrix in, n x m matrix out, I think.

And maybe also support vectors. Since core.matrix vectors function either as row or column mathematical vectors, and I believe that transposing a math vector and its pseudoinverse preserves their pseudoinverse relationship, the M-P operation can just treat a vector as one or the other and return a vector. A simple default implementation might convert vectors into matrices and back.

I don't really have a proper understanding of SVD, but the SVD-based method in the Wikipedia page is pretty simple. I'm experimenting with it and may be able to contribute a default implementation.

mars0i commented 7 years ago

Here's a draft implementation of a Moore-Penrose pseudoinverse. Again, I don't have a deep understanding of MP or of SVD, but I believe this is correct. In my experiments, the results satisfy the four Moore-Penrose properties up to a small epsilon, as long as a reasonable value is passed for the tolerance argument. I know there are probably things to do to fit it into core.matrix's protocol system. I have some questions below. I wanted to get feedback before I start working on a proper PR.

(defn pinv 
  "Moore-Penrose pseudoinverse of matrix m calculated using svd.  tolerance 
  defaults to 0 during calculation. Absolute values of singular values that 
  are less than or equal to tolerance will be treated as zero.  Won't work
  if m's implementation doesn't have an implementation of svd unless svd
  is available for the current-implementation."
  ([m] (pinv m 0.0)) ;; default tolerance should be larger--maybe calculated
  ([m tolerance]
   (let [treat-as-nonzero? (fn [x] (> (Math/abs x) tolerance))
         diag-pinv (fn [singular-vals rows cols] ; pinv only for diagonal rectangular matrices
                     (let [smaller-dim (min rows cols)
                           sigma+ (m/ensure-mutable (m/zero-matrix cols rows))] ; dims transposed
                       (u/doseq-indexed [x singular-vals i]
                          (when (treat-as-nonzero? x)
                            (m/mset! sigma+ i i (/ x))))
                       sigma+))
         [rows cols] (m/shape m)
         {:keys [U S V*]} (lin/svd m)
         S+ (diag-pinv S rows cols)]
     (m/mmul (m/transpose V*) S+ (m/transpose U)))))

My main question is how to calculate the default tolerance--how far a singular value can be from zero and still be mapped into zero. (e.g. with input [[1 2 3][4 5 6][7 8 9][10 11 12]] the results are radically wrong with the current default tolerance of 0.0, but it works correctly with various small positive tolerance values.)
Should I stop the loop through singular values as soon as it finds the first value that's not treat-as-nonzero? At that point, the rest should be "zero" as well, right? Or keep checking but if your tolerance value finds a treat-as-nonzero? value after that, then throw an exception? (Maybe this issue should be left to optimized implementation-specific versions rather than the default implementation.)
Should pinv be able to accept 1D vector arguments? I'm thinking that the answer should be no. How would it know whether to treat a 1D vector as a row vector or a column vector?
Is there any implementation that supports complex numbers at present? I didn't see a Hermitian transpose function, so I assumed that all numbers are reals and that that vanilla transpose is enough.
About tests: I think that in addition to showing that the right result is returned for specified inputs, the tests could verify that the four definitional Moore-Penrose properties are satisfied up to a small epsilon for these same inputs. Does this make sense? How many input matrices should be tested?

(btw it looks like there's a bug in clatrix. If I set the current implementation to vectorz, I can get identical results with inputs that are persistent-vector, ndarray, or aljabar, but with a clatrix input I get an incompatible shape exception. I'll report on this separately after I investigate further.)

Somelauw commented 7 years ago

According to Wikipdia, t = epsilon max(rows, cols) max(S) For Java double type, the machine epsilon is 2^-52.
Theoretically, you could stop at the first zero value as an optimization. You would need to check lin/svd to be sure.
For vectors you can divide each element by the squared magnitude of the vector as a special case. It should not matter if it's a row or column vector. For scalars, you can return the normal inverse 1/x, because pinv is a generalization of a standard inverse.
(point 5) That makes sense.

mars0i commented 7 years ago

Thanks @Somelauw. Re 1: OK, good--that clarifies for me what was described in Wikipedia. But that's just a rule of thumb that several packages adopt. Is it what core.matrix should use as the default? I have no opinion about this. In any event users can override it if we include an optional tolerance argument in the API.

Re 2: The code I wrote already stops before the first true zero, I think. At present svd returns a vector of singular values. I'm assuming that these never include 0.0, so the only question is whether to stop at the first within-tolerance-of-zero value. Not sure if you were addressing that or not.

Re 3: For vectors, I think I understand: Since the result of mmul on two vectors is their dot product, the inverse should be relative to that operation kind of multiplication. That makes sense about scalars, too.

atisharma commented 4 years ago

In matlab, the `pinv' command uses a default tolerance of max(size(A))*eps(norm(A)), where eps(x) is defined:

eps(x): when x has data type single or double, returns the positive distance from abs(x) to the next larger floating-point number of the same precision as x.

mikera / core.matrix

Feature request: Moore-Penrose pseudoinverse #320