stan-dev / math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
733 stars 184 forks source link

matrix normal distribution #86

Open syclik opened 9 years ago

syclik commented 9 years ago

Implement one or more forms of the matrix normal distribution. The general distribution is of the form

matrix_normal(matrix[N, P] y | matrix[N, P] mu, cov_matrix[N, N] SigmaRow, cov_matrix[P, P] SigmaCol),

where SigmaRow and SigmaCol are positive definite covariance matrices. The covariance matrix SigmaRow controls the covariance of the rows of y whereas SigmaRow controls the covariances of the columns.

There are three design decisions about how SigmaRow and SigmaCol are parameterized, which are whether to use

  1. diagonal or dense matrices,
  2. covariance or precision (inverse covariance), and
  3. Cholesky factors or full matrices.

There is also the issue of how to resolve the non-identifiability of scale between the two matrices, because

matrix_normal(y | mu, SigmaRow, SigmaCol) == matrix_normal(y | mu, c * SigmaRow, (1 / c) * SigmaCol).

The log density can be expressed as follows, but this is not how we'd compute it.

log matrix_normal(y | mu, SigmaRow, SigmaCol)
= - 1 / 2 * tr(inv(SigmaCol) * (y - mu)' * inv(SigmaRow) * (y - mu))
   - N / 2 * log det(SigmaRow) 
   - P / 2 * log det(SigmaCol)
   - N * P / 2 * log(2 * pi())

If LRow and LCol are Cholesky factors of SigmaRow and SigmaCol so that, e.g., SigmaRow = LRow * LRow', and z ~ normal(0, I), then

y = mu + LRow * z * LCol ~ matrix_normal(mu, SigmaRow, SigmaCol)

As shown on the Wikipedia page for matrix normal, we can evaluate that trace term as

tr(inv(SigmaCol) * (y - mu)' * inv(SigmaRow) * (y - mu))
= (vec(y) - vec(mu))' * inv(SigmaCol kronecker SigmaRow) * (vec(y) - vec(mu))

where vec(y) is the flattening of y. We then want to use the property of Kronecker products that

inv(SigmaCol kronecker SigmaRow) = inv(SigmaCol) kronecker inv(SigmaRow).

@bgoodri suggested going further with Cholesky factorizations LRow and LCol using the rules for inverse of the product of square matrixes inv(A * B) = inv(B) * inv(A) and the trace of a product rule tr(A' * B) = vec(A)' * vec(B), to derive

tr(inv(SigmaCol) * (y - mu)' * inv(SigmaRow) * (y - mu))
= tr(inv(LCol * LCol') * (y - mu)' * inv(LRow * LRow') * (y - mu))
= tr(inv(LCol') * inv(LCol) * (y - mu)' * inv(LRow') * inv(LRow) * (y - mu))
= tr((LCol' \ LCol \ (y - mu)') * (LRow' \ LRow \ (y - mu)))
= dot_product(vec(LCol' \ LCol \ (y - mu')), vec(LRow' \ LRow \ (y - mu)))

Identifiability will be left up to the user. One way to achieve it is to define one of the matrices to be a correlation matrix with a simplex of scales. That's most efficient when done with a simple product of a Cholesky factor rather than a quadratic form for the full correlation matrix. A second, more symmetric way to achieve a proper posterior is to decompose both covariance matrices into scales time Cholesky factors and put priors on both of the scales.

There is also a matrix Student-t distribution, which should just follow whatever matrix normal does.

syclik commented 9 years ago

From @rtrangucci on March 24, 2015 14:42

We still need first-order reverse-mode tests for matrix_normal_prec_log, unless I haven't found all the tests.

Right now there are three .cpp files that implement matrix_normal_prec_log tests:

As far as I can tell, none of these files implement reverse-mode first order tests. The fwd and mix files run first-order fvar<var> and fvar<fvar<var>> tests. We need a test/unit/math/rev/mat/prob/matrix_normal_test.cpp that does the reverse-mode tests.