stan-dev / stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
2.59k stars 369 forks source link

asymmetry in inv_wishart_rng & mismatch vs. MCMCpack #644

Closed bob-carpenter closed 10 years ago

bob-carpenter commented 10 years ago

Matthew Spencer reports on stan-users:

I tried sampling covariance matrices using inv_wishart_rng in Stan with RStan interface. With a 2 x 2 identity matrix as the scale matrix, and degrees of freedom 4, I was expecting that the mean of the samples would also be an identity matrix (e.g. p 574 in Bayesian Data Analysis, Gelman et al, 2nd edition). Instead, I get something in which the element [2,2] appears systematically larger than the element [1,1]. Also, from looking at the density function, it doesn't seem as though the support is only diagonal matrices if the scale matrix is diagonal, yet inv_wishart_rng only returns diagonal matrices.

I compared this to riwish() in MCMCpack, which seems to behave as I'd expect it to.

To see the difference, I drew ellipses of unit Mahalanobis distance around the origin for each sampled covariance matrix and their mean, for both inv_wishart_rng output and riwish output (invwishart.pdf, attached).

I'm using Stan 2.2.0, R 3.1.0, Rstan 2.0.1, MCMCpack 1.3-3, 64-bit Ubuntu 12.04, 16 G RAM. Session info and code attached. Not sure what C compiler R is using (but .R/Makevars also attached with session info, and I know I have g++ on my system).

randommm commented 10 years ago

Using the same algorithm of MCMCpack::riwish in Stan gives non-diagonal matrices with S diagonal and still passes the unit tests. I'll do more test later, but it seems to work. It also has the advantage of not computing inverses, only a Cholesky decomposition:

template <class RNG>
inline Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic>
inv_wishart_rng(double nu,
                const Eigen::Matrix<double,Eigen::Dynamic,Eigen::Dynamic>& S,
                RNG& rng) {

  static const char* function = "stan::prob::inv_wishart_rng(%1%)";

  using stan::math::check_greater;
  using stan::math::check_size_match;
  using Eigen::MatrixXd;

  typename Eigen::Matrix<double,Eigen::Dynamic,Eigen::Dynamic>::size_type k = S.rows();
  check_greater(function, nu, k-1, "Degrees of freedom parameter");
  check_size_match(function, 
                   k, "Rows of scale parameter",
                   S.cols(), "columns of scale parameter");

  MatrixXd Z = MatrixXd::Zero(k, k);
  for (int j = 0; j < k; j++) {
    for (int i = 0; i < k; i++) {
      if (j > i)
        Z(i, j) = normal_rng(0, 1, rng);
      else if (i == j)
        Z(i, j) = std::sqrt(chi_square_rng(nu--, rng));
    }
  }

  Z = Z * S.llt().matrixU();

  return Z.transpose() * Z;
}
aadler commented 10 years ago

Its 2am and I'm stuck in an airport terminal due to a canceled flight, so please forgive me if this isn't completely lucid, but in that last for loop, wouldn't it be faster to declare the i=j after the for j and then have i iterate from j+1? On May 11, 2014 4:36 PM, "Bob Carpenter" notifications@github.com wrote:

Matthew Spencer reports on stan-users:

I tried sampling covariance matrices using inv_wishart_rng in Stan with RStan interface. With a 2 x 2 identity matrix as the scale matrix, and degrees of freedom 4, I was expecting that the mean of the samples would also be an identity matrix (e.g. p 574 in Bayesian Data Analysis, Gelman et al, 2nd edition). Instead, I get something in which the element [2,2] appears systematically larger than the element [1,1]. Also, from looking at the density function, it doesn't seem as though the support is only diagonal matrices if the scale matrix is diagonal, yet inv_wishart_rng only returns diagonal matrices.

I compared this to riwish() in MCMCpack, which seems to behave as I'd expect it to.

To see the difference, I drew ellipses of unit Mahalanobis distance around the origin for each sampled covariance matrix and their mean, for both inv_wishart_rng output and riwish output (invwishart.pdf, attached).

I'm using Stan 2.2.0, R 3.1.0, Rstan 2.0.1, MCMCpack 1.3-3, 64-bit Ubuntu 12.04, 16 G RAM. Session info and code attached. Not sure what C compiler R is using (but .R/Makevars also attached with session info, and I know I have g++ on my system).

— Reply to this email directly or view it on GitHubhttps://github.com/stan-dev/stan/issues/644 .

bob-carpenter commented 10 years ago

On May 13, 2014, at 8:16 AM, Avraham Adler notifications@github.com wrote:

Its 2am and I'm stuck in an airport terminal due to a canceled flight,

Sorry to hear it.

so please forgive me if this isn't completely lucid, but in that last for loop, wouldn't it be faster to declare the i=j after the for j and then have i iterate from j+1?

Absolutely. Even further, the test should be pulled out. Branch points are SLOW! I think this:

for (int j = 0; j < k; j++) { for (int i = 0; i < k; i++) { if (j > i) Z(i, j) = normal_rng(0, 1, rng); else if (i == j) Z(i, j) = std::sqrt(chi_square_rng(nu--, rng)); } }

should be:

for (int j = 0; j < k; ++j) { for (int i = 0; i < j; ++i) Z(i, j) = normal_rng(0, 1, rng); Z(j,j) = std::sqrt(chi_square_rng(nu--, rng)); }

Nice that the loops are in the right order for Eigen's column major representation. I kept the memory layout order by putting the Z(j,j) assignment after the Z(i,j) assignment in the loop.

Would somebody else mind testing/changing this? I only saw the issue, not a branch on which to fix it.

On May 11, 2014 4:36 PM, "Bob Carpenter" notifications@github.com wrote:

Matthew Spencer reports on stan-users:

I tried sampling covariance matrices using inv_wishart_rng in Stan with RStan interface. With a 2 x 2 identity matrix as the scale matrix, and degrees of freedom 4, I was expecting that the mean of the samples would also be an identity matrix (e.g. p 574 in Bayesian Data Analysis, Gelman et al, 2nd edition). Instead, I get something in which the element [2,2] appears systematically larger than the element [1,1]. Also, from looking at the density function, it doesn't seem as though the support is only diagonal matrices if the scale matrix is diagonal, yet inv_wishart_rng only returns diagonal matrices.

I compared this to riwish() in MCMCpack, which seems to behave as I'd expect it to.

To see the difference, I drew ellipses of unit Mahalanobis distance around the origin for each sampled covariance matrix and their mean, for both inv_wishart_rng output and riwish output (invwishart.pdf, attached).

I'm using Stan 2.2.0, R 3.1.0, Rstan 2.0.1, MCMCpack 1.3-3, 64-bit Ubuntu 12.04, 16 G RAM. Session info and code attached. Not sure what C compiler R is using (but .R/Makevars also attached with session info, and I know I have g++ on my system).

— Reply to this email directly or view it on GitHubhttps://github.com/stan-dev/stan/issues/644 .

— Reply to this email directly or view it on GitHub.

randommm commented 10 years ago

That's a much better way of writing it indeed =)

I haven't created a branch because I wasn't sure the code was correct as it was just a quick test. In fact I just rechecked it, and it does need an inverse at the end =[.

return Z.transpose() * Z;

should be:

return (Z.transpose() * Z).inverse();

But, at least it seems to be consistent with what Matthew Spencer said "With a 2 x 2 identity matrix as the scale matrix, and degrees of freedom 4, I was expecting that the mean of the samples would also be an identity matrix".

A quick test in a 3 x 3 matrix:

apply( replicate(10000, MCMCpack::riwish(5, diag(3))), 1:2, mean) [,1] [,2] [,3] [1,] 0.97678059 -0.04625054 0.06010676 [2,] -0.04625054 0.95652007 -0.03652983 [3,] 0.06010676 -0.03652983 0.99755639

Now running a small test in Stan: Matrix<double,Dynamic,Dynamic> sigma(3,3); sigma << 1.0, 0, 0.0, 0, 1.0, 0.0, 0, 0.0, 1.0; Eigen::MatrixXd Z = Eigen::MatrixXd::Zero(3, 3); for (int i = 0; i < 10000; i++) Z += stan::prob::inv_wishart_rng(5.0, sigma,rng); Z /= 10000; std::cout << Z << std::endl;

With the current code, you get: 0.335442 0 0 0 0.496115 0 0 0 1.08578

With the new code, you get: 1.00591 0.00425586 0.0209529 0.00425586 0.966176 0.0163307 0.0209529 0.0163307 0.953695

I'll check it some and try to create an unit test that reflect this property also. But what i find odd is that unit tests pass even if when forgot to invert the matrix at the end.

randommm commented 10 years ago

Here's the conclusion I came up with.

The problem goes beyond, it starts at wishart_rng, which is used by inv_wishart_rng (inv_wishart_rng actually seems to be correct).

Both Stan current code and MCMCPack seems to use the algorithm of http://www.gwu.edu/~forcpgm/YuChengKu-030510final-WishartYu-ChengKu.pdf for wishart_rng. This algorithm seems to work well when the scale matrix is symmetric, but it doesn't when it's not. The reason is simple: in this algorithm, the scale matrix only affect the rng output via its Cholesky decomposition, but there's no Cholesky decomposition for non-symmetric matrices.

So what Eigen llt and R does is to take the lower part of matrix (the upper in case of R) to calculate Cholesky decomposition (see https://mathematica.stackexchange.com/questions/13808/can-the-choleskydecomposition-function-in-mathematica-be-made-to-work-on-non-sym) and ignore the other part (i.e. assume it is symmetric), but this will obviously produce wrong output, since it is discarding information from the upper part of the scale matrix (and I tested that it is indeed wrong).

An idea I had and it is something the seems to work is to transform the scale matrix into a symmetric one via (S.transpose + S)/2. It seems to work with the test proposed in the article (page 5, equation 1), but it's just an idea I had, so it's not proved theoretically and is probably wrong.

Note: as I said, besides that, Stan current code also seems to wrongly implement the above algorithm, and it thus doesn't produce correct output even for symmetric scale matrices.

bob-carpenter commented 10 years ago

Shouldn't we throw an error if wishart_rng or inv_wishart_rng are called with asymmetric inputs (up to some tolerance)? And then if they're within tolerance, can't we just symmetrize before we start? If we're within tolerance, then symmetrizing can be just taking lower triangular (as is, I think), or by the more computationally expensive 0.5 * (s + s').

On May 14, 2014, at 6:15 AM, Marco Inacio notifications@github.com wrote:

Here's the conclusion I came up with.

The problem goes beyond, it starts at wishart_rng, which is used by inv_wishart_rng (inv_wishart_rng actually seems to be correct).

Both Stan current code and MCMCPack seems to use the algorithm of http://www.gwu.edu/~forcpgm/YuChengKu-030510final-WishartYu-ChengKu.pdf for wishart_rng. This algorithm seems to work well when the scale matrix is symmetric, but it doesn't when it's not. The reason is simple: in this algorithm, the scale matrix only affect the rng output via its Cholesky decomposition, but there's no Cholesky decomposition for non-symmetric matrices.

So what Eigen llt and R does is to take the lower part of matrix (the upper in case of R) to calculate Cholesky decomposition (see https://mathematica.stackexchange.com/questions/13808/can-the-choleskydecomposition-function-in-mathematica-be-made-to-work-on-non-sym) and ignore the other part (i.e. assume it is symmetric), but this will obviously produce wrong output, since it is discarding information from the upper part of the scale matrix (and I tested that it is indeed wrong).

An idea I had and it is something the seems to work is to transform the scale matrix into a symmetric one via (S.transpose + S)/2. It seems to work with the test proposed in the article (page 5, equation 1), but it's just an idea I had, so it's not proved theoretically and is probably wrong.

Note: as I said, besides that, Stan current code also seems to wrongly implement the above algorithm, and it thus doesn't produce correct output even for symmetric scale matrices.

— Reply to this email directly or view it on GitHub.

randommm commented 10 years ago

I thought that the scale matrix needed not to be symmetric, I thought this because the article of algorithm and Wikipedia says nothing about it being symmetric, and also the current Stan unit tests use asymmetric scale matrices. But both BDA and this article http://www.math.wustl.edu/~sawyer/hmhandouts/Wishart.pdf (at very beginning) say it must be.

Still (and probably this is a point for another topic), I'm really concerned that everything that I tried as the RNG function (even clearly wrong things) passed the unit tests, so there's probably some serious problems with them.

randommm commented 10 years ago

About the checks of symmetry and tolerance, I just checked that both wishart_log and inv_wishart_log doesn't do any checks of symmetry on both the outcome matrix Y and scale matrix S (and both should be symmetric), also:

wishart_log: the upper part of S matrix is ignored and asymmetric Y produces inconsistent log_prob. inv_wishart_log: the upper part of Y matrix is ignored and asymmetric S produces inconsistent log_prob.

I think that whatever is done on wishart_rng and inv_wishart_rng, it should be consistent with log probability ones.

bob-carpenter commented 10 years ago

I'd like to hear Ben's input on this --- he's the multivariate distro expert.

I think we want to add a symmetry requirement on the input. Basically test that it's a valid "cov_matrix" (symmetric positive definite --- we're going to rename to "spd_matrix" in the near future).

So I'm not sure what an asymmetric input is supposed to produce in the way of output.

On May 15, 2014, at 4:54 AM, Marco Inacio notifications@github.com wrote:

About the checks of symmetry and tolerance, I just checked that both wishart_log and inv_wishart_log doesn't do any checks of symmetry on both the outcome matrix Y and scale matrix S (and both should be symmetric), also:

wishart_log: the upper part of S matrix is ignored and asymmetric Y produces inconsistent log_prob. inv_wishart_log: the upper part of Y matrix is ignored and asymmetric S produces inconsistent log_prob.

I think that whatever is done on wishart_rng and inv_wishart_rng, it should be consistent with log probability ones.

— Reply to this email directly or view it on GitHub.

PeterLi2016 commented 10 years ago

The reason the test keeps passing is because I used Chi-Square with df=1. Ben directed me to http://stats.stackexchange.com/questions/34165/expected-value-of-the-log-determinant-of-a-wishart-matrix a while ago. But I'm not sure what to do about this test because what we're comparing is the expected value of the log of the determinant of the samples and the expected value from the link above.

aadler commented 10 years ago

This library is LGPL, which would allow its use in Stan as long as that portion is marked as LGPL and kept the same. Perhaps that would help, if nothing else for sanity testing.

randommm commented 10 years ago

So do you think that changing df of chi square to boost::math::round(2 * std::pow(N, 0.4)) - 1 like is done with the normal multivariate makes it more strict? I've done this know and it passes (with the new code).

PeterLi2016 commented 10 years ago

Wouldn't that make it worse since we'd have more df so the threshold would be higher?

On Mon, May 19, 2014 at 1:29 PM, Marco Inacio notifications@github.comwrote:

So do you think that changing df of chi square to boost::math::round(2 * std::pow(N, 0.4)) - 1 like is done with the normal multivariate makes it more strict? I've done this know and it passes (with the new code).

— Reply to this email directly or view it on GitHubhttps://github.com/stan-dev/stan/issues/644#issuecomment-43532705 .

PeterLi2016 commented 10 years ago

The reason we use what you suggested as df for other tests is because we bin the results and compare the expected value and actual number of trials that fall within each bin. See http://www.stat.yale.edu/Courses/1997-98/101/chigf.htm

On Mon, May 19, 2014 at 1:40 PM, Peter Li li.peter94@gmail.com wrote:

Wouldn't that make it worse since we'd have more df so the threshold would be higher?

On Mon, May 19, 2014 at 1:29 PM, Marco Inacio notifications@github.comwrote:

So do you think that changing df of chi square to boost::math::round(2 * std::pow(N, 0.4)) - 1 like is done with the normal multivariate makes it more strict? I've done this know and it passes (with the new code).

— Reply to this email directly or view it on GitHubhttps://github.com/stan-dev/stan/issues/644#issuecomment-43532705 .

randommm commented 10 years ago

I see, so maybe there's not much do about the chi square tests. Still, the problems the user reported about the RNG do make sense, so I tried to add tests for them (branch feature/issue-644-inv_wishart_rng). I'll wait for @bgoodri 's input on this as @bob-carpenter suggested.

bgoodri commented 10 years ago

I can't imagine why we should accept an asymmetric scale matrix. As for the code, this

  Z = Z * S.llt().matrixU();
  return Z.transpose() * Z;

should be

Z = Z.triangularView<Eigen::Lower>() * S.llt().matrixU();
return crossprod(Z);

which will insure symmetry and then you don't need to preallocate Z with zeros or otherwise worry about allocating the upper triangle.

For inv_wishart, at the end, you don't want to do the inverse of a crossproduct

inverse(crossprod(Z))

but rather a crossproduct of the inverse of the triangular matrix Z

return crossprod(Z.template triangularView<Eigen::Lower>.solveInPlace(I));
randommm commented 10 years ago

So do you think that it's better for inv_wishart_rng, wishart_rng, inv_wishart_log and wishart_log to consider only the lower triangular portion of the scale and outcome matrices? To me this seems good.

bgoodri commented 10 years ago

Stan traditionally would check for symmetry, whereas Eigen traditionally would only consider the triangle. Since prng is rarely the bottleneck, I would check. Hopefully someday we will have a way via NDEBUG or similar to turn all the checks off at once.

On Tue, May 20, 2014 at 3:10 PM, Marco Inacio notifications@github.comwrote:

So do you think that it's better for inv_wishart_rng, wishart_rng, inv_wishart_log and wishart_log to consider only the lower triangular portion of the scale and outcome matrices? To me this seems good.

— Reply to this email directly or view it on GitHubhttps://github.com/stan-dev/stan/issues/644#issuecomment-43669853 .

bob-carpenter commented 10 years ago

We have started to introduce operations on lower-triangular matrices, so that'd be another option. But then you'd want to check it's lower triangular (very cheap).

If we do check for symmetry, it should be with a forgiving tolerance (1e-6 relative [not absolute] diffs?). And then we should go ahead and only use the lower triangular portion.

Averaging upper and lower seems too expensive.

On May 20, 2014, at 9:15 PM, bgoodri notifications@github.com wrote:

Stan traditionally would check for symmetry, whereas Eigen traditionally would only consider the triangle. Since prng is rarely the bottleneck, I would check. Hopefully someday we will have a way via NDEBUG or similar to turn all the checks off at once.

On Tue, May 20, 2014 at 3:10 PM, Marco Inacio notifications@github.comwrote:

So do you think that it's better for inv_wishart_rng, wishart_rng, inv_wishart_log and wishart_log to consider only the lower triangular portion of the scale and outcome matrices? To me this seems good.

— Reply to this email directly or view it on GitHubhttps://github.com/stan-dev/stan/issues/644#issuecomment-43669853 .

— Reply to this email directly or view it on GitHub.

randommm commented 10 years ago

On 14-05-20 04:28 PM, Bob Carpenter wrote:

We have started to introduce operations on lower-triangular matrices, so that'd be another option. But then you'd want to check it's lower triangular (very cheap). Parametrizations using lower-triangular matrices (that's what you meant right?) could be an option for those who doesn't want to check for symmetry on every iteration. I think this could be worked on another issue which could include normal multivariate and friends, the only problem would be the existence of a lot function signatures for all this which would eventually get confusing.

If we do check for symmetry, it should be with a forgiving tolerance (1e-6 relative [not absolute] diffs?). And then we should go ahead and only use the lower triangular portion. Current check_symmetric function doesn't include this, so would this also be a matter for another issue?

bob-carpenter commented 10 years ago

On May 20, 2014, at 9:43 PM, Marco Inacio notifications@github.com wrote:

On 14-05-20 04:28 PM, Bob Carpenter wrote:

We have started to introduce operations on lower-triangular matrices, so that'd be another option. But then you'd want to check it's lower triangular (very cheap).

Parametrizations using lower-triangular matrices (that's what you meant right?) could be an option for those who doesn't want to check for symmetry on every iteration.

I really meant functions, like say:

multiply_lower_tri_self_transpose(matrix);

which only looks at the lower triangular part of the input.

I think this could be worked on another issue which could include normal multivariate and friends, the only problem would be the existence of a lot function signatures for all this which would eventually get confusing.

I don't think the multiple signatures will be too bad if we can come up with a coherent naming convention (and numbering like for negative_binomial isn't what I have in mind).