stan-dev / math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
737 stars 185 forks source link

SVD functions #2192

Closed bbbales2 closed 3 years ago

bbbales2 commented 3 years ago

Description

SVD(X) = U Sigma V^T

We currently have a function for the singular values (the diagonal of Sigma): https://mc-stan.org/docs/2_25/functions-reference/linear-algebra-functions-and-solvers.html#singular-value-decomposition

We need functions for the U and the V.

matrix svd_U(matrix);
matrix svd_V(matrix);

The autodiff is here though it looks like there's a little work to finish up the eqs for SVD U and V: https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf

This one has the equations: https://arxiv.org/pdf/1903.09650.pdf

Current Version:

v3.3.0

bbbales2 commented 3 years ago

@bgoodri so you get e-mails in this thread

bbbales2 commented 3 years ago

This paper has some nice closed form equations for what we need to do https://arxiv.org/pdf/1903.09650.pdf

Here's some R code testing it against finite differences:

singular_values = function(A) {
  svd(A)$d
}

svd_U = function(A) {
  svd(A)$u
}

svd_V = function(A) {
  svd(A)$v
}

M = 3
N = 7
A = matrix(runif(M * N), nrow = M, ncol = N)

U = svd_U(A)
S = singular_values(A)
V = svd_V(A)

# Set up some random test matrices
adjU = matrix(runif(nrow(U) * ncol(U)), nrow = nrow(U), ncol = ncol(U))
adjS = runif(length(S))
adjV = matrix(runif(nrow(V) * ncol(V)), nrow = nrow(V), ncol = ncol(V))

Fp = matrix(0, nrow = M, ncol = M)
Fm = matrix(0, nrow = M, ncol = M)
for(i in 1:length(S)) {
  for(j in 1:length(S)) {
    if(j == i) {
      Fp[i, j] = Fm[i, j] = 0.0
    } else {
      Fp[i, j] = 1.0 / (S[j] - S[i]) +
        1.0 / (S[i] + S[j])
      Fm[i, j] = 1.0 / (S[j] - S[i]) -
        1.0 / (S[i] + S[j])
    }
  }
}

adjA = matrix(0, nrow = nrow(A), ncol = ncol(A))

# Contributions from adjU
adjA = adjA + 0.5 * U %*% (Fp * (t(U) %*% adjU - t(adjU) %*% U)) %*% t(V)
adjA = adjA + (diag(M) - U %*% t(U)) %*% adjU %*% diag(1.0 / S) %*% t(V)

# Contributes from adjS
adjA = adjA + U %*% diag(adjS) %*% t(V)

# Contributions from adjV
adjA = adjA + 0.5 * U %*% (Fm * (t(V) %*% adjV - t(adjV) %*% V)) %*% t(V)
adjA = adjA + U %*% diag(1.0 / S) %*% t(adjV) %*% (diag(nrow(V)) - V %*% t(V))
#### Compute a reference using finite difference (not needed for C++)
fd_adjA = matrix(0, nrow = M, ncol = N)
for(i in 1:M) {
  for(j in 1:N) {
    dx = 1e-3
    Ap = A
    Am = A
    Ap[i, j] = Ap[i, j] + dx
    Am[i, j] = Am[i, j] - dx

    adj = (sum(adjU * (svd_U(Ap))) - sum(adjU * svd_U(Am))) / (2 * dx)
    adj = adj + (sum(adjS * (singular_values(Ap))) - sum(adjS * singular_values(Am))) / (2 * dx)
    adj = adj + (sum(adjV * (svd_V(Ap))) - sum(adjV * svd_V(Am))) / (2 * dx)

    fd_adjA[i, j] = adj
  }
}

# adjA and fd_adjA two should match
adjA
fd_adjA
bbbales2 commented 3 years ago

@hyunjimoon first step will be to add reverse mode autodiff for the singular_values function: https://github.com/stan-dev/math/blob/develop/stan/math/prim/fun/singular_values.hpp

(it's the same as the singular_values function above in R)

Use as a starting point the singular_values prim implementation above and the version of eigenvalues_sym in these two files: prim: https://github.com/stan-dev/math/blob/654aa5119a3d7870c527cb9f3cedb487b72474e3/stan/math/prim/fun/eigenvalues_sym.hpp rev: https://github.com/stan-dev/math/blob/654aa5119a3d7870c527cb9f3cedb487b72474e3/stan/math/rev/fun/eigenvalues_sym.hpp

The old singular_values implementation shows you how to use the Eigen singular value decomposition. The eigenvalues_sym function shows you how to use automatic differentiation for a function with the same signature (and similar calculations).

The math for the eigenvalues_sym is also in the same paper (Eq. 3).

spinkney commented 3 years ago

This would be amazing to have! We can do Bayesian recommendation systems (probably using VI, depending on the size of the data).

bbbales2 commented 3 years ago

@bgoodri we have the option of doing a thin SVD vs. a full SVD. Is there a reason to favor the full over the thin? Or do you think we can get away with thin? We can also do both (svd_U and svd_thin_U).

bgoodri commented 3 years ago

I would say thin only.

On Mon, Dec 21, 2020 at 9:26 PM Ben Bales notifications@github.com wrote:

@bgoodri https://github.com/bgoodri we have the option of doing a thin SVD vs. a full SVD. Is there a reason to favor the full over the thin? Or do you think we can get away with thin? We can also do both (svd_U and svd_thin_U).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/2192#issuecomment-749301372, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKUIMOCYN2WNTISIFHTSV77UHANCNFSM4TTS5A7A .

bbbales2 commented 3 years ago

This is in now, off to stanc3