mjhajharia / transforms

2 stars 1 forks source link

symmetric positive definite matrix transforms (aka covariance matrices) #15

Open mjhajharia opened 2 years ago

mjhajharia commented 2 years ago

The set of SPD matrices is not compact, so we need to add proper distributions. The Wisharts are conjugate for covariance in a multivariate normal model, so they're essentially like testing multivariate normal posteriors.

mjhajharia commented 2 years ago

@sethaxen maybe I can start with correlation matrices and you can see covariance matrices

bob-carpenter commented 2 years ago

Is this for Cholesky factors? Or for the whole thing? We usually go with Cholesky-factor parameterizations, so to me, those are more important. But for symmetric positive definite, that's just an unconstraining transform from (0, inf) to (-inf, inf) for the diagonals.

The alternative is to take a symmetric matrix and push it through matrix exp. I have no clue as to what the Jacobian determinant looks like there.

bob-carpenter commented 2 years ago

I think correlation matrices should be broken out into a separate issue. Adam Haber has looked at those.

sethaxen commented 2 years ago

@sethaxen maybe I can start with correlation matrices and you can see covariance matrices

Works for me!

But for symmetric positive definite, that's just an unconstraining transform from (0, inf) to (-inf, inf) for the diagonals.

Agreed, we could just take the "best" Cholesky transform with the transform for the positive diagonals.

Another alternative is to use the "best" transform for orthonormal matrices for the eigenvectors and then a transform for the positive eigenvectors. I think I've seen other approaches in the literature before and will look into it.

The alternative is to take a symmetric matrix and push it through matrix exp. I have no clue as to what the Jacobian determinant looks like there.

From http://eprints.maths.manchester.ac.uk/2754/1/catalog.pdf, the eigendecomposition is still the preferred way to compute exp(A) for symmetric A. i.e. if A=U Λ U', then exp(A) = U exp(Λ) U'. This is the composition of 3 bijective maps: eigendecomposition, diagonal exponential, then inverse of eigendecomposition, so we only need to work out the Jacobian correction from symmetric eigendecomposition. From Alan Edelman's lecture notes (Section 2.5 of http://web.mit.edu/18.325/www/handouts/handout3.pdf), the Jacobian correction for symmetric eigendecomposition is $\prod_{i < j} |\lambda_j - \lambda_i|$. So the correction for symmetric matrix exponential would be

$$\prod_{i=1}^n e^{\lambdai} \prod{j=i+1}^n \left| \frac{e^{\lambda_j} - e^{\lambda_i}}{\lambda_j - \lambda_i} \right |$$

I verified this using finite differences in Julia:

```julia julia> using FiniteDifferences, LinearAlgebra, LogExpFunctions julia> function sparse_to_vec(U) inds = findall(!iszero, U) v = U[inds] function vec_to_sparse(v) Unew = similar(U) fill!(Unew, 0) Unew[inds] .= v return Unew end return v, vec_to_sparse end sparse_to_vec (generic function with 1 method) julia> function symexp_logdetjac(A) λ = eigvals(A) T = float(eltype(λ)) n = size(A, 1) s = zero(T) for i in 1:n λᵢ = λ[i] s += λᵢ for j in (i + 1):n λₗ, λᵤ = minmax(λᵢ, λ[j]) s += logsubexp(λᵤ, λₗ) - log(λᵤ - λₗ) end end return s end symexp_logdetjac (generic function with 1 method) julia> n = 5; julia> U = triu(randn(n, n)) 5×5 Matrix{Float64}: -0.500465 2.08902 0.782754 0.844179 -0.766196 0.0 0.688097 0.929935 -1.54167 0.114989 0.0 0.0 -0.24768 1.41416 0.830064 0.0 0.0 0.0 -1.26386 0.915003 0.0 0.0 0.0 0.0 -0.220671 julia> A = Symmetric(U) 5×5 Symmetric{Float64, Matrix{Float64}}: -0.500465 2.08902 0.782754 0.844179 -0.766196 2.08902 0.688097 0.929935 -1.54167 0.114989 0.782754 0.929935 -0.24768 1.41416 0.830064 0.844179 -1.54167 1.41416 -1.26386 0.915003 -0.766196 0.114989 0.830064 0.915003 -0.220671 julia> λ = eigvals(A); julia> v, vec_to_sparse = sparse_to_vec(U); julia> J = jacobian(central_fdm(5, 1), first ∘ sparse_to_vec ∘ triu ∘ exp ∘ Symmetric ∘ vec_to_sparse, v)[1] 15×15 Matrix{Float64}: 3.20214 5.12366 1.56993 … -0.531845 -0.253399 -0.0277042 0.0378789 2.56183 6.45721 3.58887 -0.658365 -0.197366 0.0777662 0.0283516 1.56993 7.17774 7.2442 -0.748569 -0.136485 0.130263 0.0115425 1.21801 2.00235 0.751506 0.378459 -0.0642349 -0.0627738 -0.0795216 0.740135 2.4536 1.75339 0.987444 -0.0265266 -0.262457 -0.050647 0.348811 0.707586 0.359101 … 0.489996 1.33695 0.403651 0.159073 0.100272 -0.734982 -0.636234 0.601058 0.14057 -0.191995 -0.0795725 0.139963 -0.442792 -1.29323 1.12791 0.176198 -0.420207 -0.0500219 0.0677437 -0.232091 -0.308252 0.0485569 0.880119 0.719764 0.158394 -0.00103785 -0.0805224 0.237911 -0.389536 0.402673 1.05731 0.157784 -0.338023 -0.545782 -0.145159 … 2.16254 0.939959 -0.0392903 -0.249754 -0.265922 -0.658365 -0.374284 3.7016 1.0117 -0.725274 -0.248751 -0.1267 -0.197366 -0.0682424 1.0117 2.09928 0.841366 0.569505 -0.0138521 0.0777662 0.0651313 -0.725274 0.841366 1.50725 0.580339 0.0378789 0.0567031 0.0115425 -0.497502 1.13901 1.16068 1.44321 julia> logabsdet(J)[1] 0.3457424809406372 julia> symexp_logdetjac(A) 0.3457424809559124 ```

I'd expect ADing through this transform would be quite a bit more expensive than for the Cholesky transform, which only requires adding the unconstrained parameters behind the positive diagonals.

bob-carpenter commented 2 years ago

I think @adamhaber already has the correlation matrix and Cholesky for correlation matrix transforms.

See: https://github.com/tensorflow/probability/issues/694

sethaxen commented 2 years ago

@bob-carpenter I'm not certain if your comment is in reply to me, but I was only referring to symmetric positive definite matrices.

bob-carpenter commented 2 years ago

I verified this using finite differences in Julia:

Super cool! That correction is $\mathcal{O}(n^2)$. We have matrix_exp implemented in Stan using the Al-Mohy and Higham algorithm.

sethaxen commented 2 years ago

f780af0 adds derivations for this section