stan-dev / rstan

RStan, the R interface to Stan
https://mc-stan.org
1.04k stars 265 forks source link

Output of log_prob and optimizing not consistent for inverse Wishart model. #383

Open simpsonm opened 7 years ago

simpsonm commented 7 years ago

Summary:

When obtaining a provisional value for the posterior mode via postmode = optimizing(...), I believe postmode$value should be the same as log_prob(fit, unconstrain_pars(fit, postmode$par), adjust_transform=FALSE). With an inverse Wishart in the model, this is not true. In fact, the difference isn't even constant in the parameter.

Description:

The help file for log_prob states

log_prob 'signature(object = "stanfit")' Compute the log posterior ('lp__') for the model represented by a 'stanfit' object. Note that, by default, 'log_prob' returns the log posterior in the unconstrained space; set 'adjust_transform = FALSE' to make the values match Stan's output.

and the help file for optimizing states

Value:

If the optimization is done successfully, a list with named
components:

par: The point estimate found. Its form (vector or list) is
     determined by the 'as_vector' argument.

value: The value of the log-posterior (up to an additive constant,
     the '"lp__"' in Stan) corresponding to 'par'.

So optimizing(...)$value and log_prob(...) should match if log_prob() is called on optimizing(...)$par and care is taken to transform the parameter to the unconstrained space and not include the Jacobian from the transformation. This does not happen.

In a much more complicated example than below, I created the logposterior function myself, and it was the same as optimizing()$value except for an additive constant, so I think the problem lies in log_prob() or unconstrain_pars(). log_prob seems to work fine on models that do not include an inverse Wishart component, so I suspect that the constraints for a covariance matrix aren't being handled by unconstrain_pars() correctly.

Reproducible Steps:

Stan model code:

data {
  int<lower=0> nobs;
  int<lower=0> ndim;
  matrix[nobs, ndim] y;
  real mu_mn;
  real<lower=0> mu_sd;
  real<lower=ndim> sig_df;
  cov_matrix[ndim] sig_scale;
}
parameters {
  vector[ndim] mu;
  cov_matrix[ndim] sigma;
}
model {
  for(i in 1:nobs){
    y[i] ~ multi_normal(mu, sigma);
  }
  mu ~ normal(mu_mn, mu_sd);
  sigma ~ inv_wishart(sig_df, sig_scale);
}

R code

library(rstan)

set.seed(234234)

nobs <- 1000
ndim <- 10
sigma <- crossprod(matrix(rnorm(2*ndim^2),ncol = ndim))
cholsig <- chol(sigma)
mu <- rnorm(ndim)
y <- matrix(0, nobs, ndim)
for(i in 1:nobs){
  y[i,] <- mu + crossprod(cholsig, rnorm(ndim))
}

standat <- list(y = y, mu_mn = 0, mu_sd = 10, sig_df= ndim + 1, sig_scale= diag(1, ndim))

model <- stan_model("invwish.stan")

fit <- stan("invwish.stan", data = standat, chains = 1, iter = 1)

postmode1 <- optimizing(model, data = standat, as_vector = FALSE)

## iter = 10 to get a different parameter
postmode2 <- optimizing(model, data = standat, as_vector = FALSE, iter = 10)

lp1 <- log_prob(fit, unconstrain_pars(fit, postmode1$par), adjust_transform = FALSE)
lp2 <- log_prob(fit, unconstrain_pars(fit, postmode2$par), adjust_transform = FALSE)

lprobs <- cbind(c(lp1, postmode1$value, lp1 - postmode1$value), c(lp2, postmode2$value, lp2 - postmode2$value))
rownames(lprobs) <- c("log_prob", "optimizing", "diff")
colnames(lprobs) <- c("par1", "par2")

lprobs

Current Output:

 lprobs
                 par1       par2
log_prob   -136468.17  -241054.5
optimizing  -18389.39 -2462650.2
diff       -118078.78  2221595.7

Expected Output:

 lprobs
           par1         par2
log_prob     X            Y
optimizing   X            Y
diff         0            0

or at least

 lprobs
           par1         par2
log_prob     X            Y
optimizing   X + C        Y + C
diff         C            C

RStan Version:

2.14.1

R Version:

R version 3.3.2 (2016-10-31)

Operating System:

Windows 10 Education Version 1607 OS Build 14393.693

bob-carpenter commented 7 years ago

That's expected. The MCMC output includes the Jacobian adjustment (log determinant of inverse transform from unconstrained basis to covariance matrix) which ensures the distribution over covariance matrices is uniform. The Jacobian is excluded under the MLE. There's some more detail in the following case study and in the Stan manual:

http://mc-stan.org/documentation/case-studies/mle-params.html

bob-carpenter commented 7 years ago

OK, I read that more carefully and am not sure what RStan's trying to do. So I'll let Ben and Jonah resolve this. There's no way to make them the same within CmdStan or from within the Stan model object compiled from C++, but RStan may be doing something fancy under the hood.

schrimpf commented 7 years ago

I encountered a similar problem. I suspect it is related to a bug that has constrain_pars and unconstrain_pars not being inverses of one another for a cov_matrix parameter. For example, after simpsonm's code, add

postmode1$par$mu - constrain_pars(fit, unconstrain_pars(fit,postmode1$par))$mu
postmode1$par$sigma - constrain_pars(fit, unconstrain_pars(fit,postmode1$par))$sigma

which outputs

postmode1$par$mu - constrain_pars(fit, unconstrain_pars(fit,postmode1$par))$mu
 [1] 0 0 0 0 0 0 0 0 0 0
> postmode1$par$sigma - constrain_pars(fit, unconstrain_pars(fit,postmode1$par))$sigma
           [,1]        [,2]       [,3]       [,4]       [,5]      [,6]       [,7]        [,8]
 [1,]  8.761881   3.3439679 -2.9459641  2.6813260 -2.8749787 -2.348193 -2.4893775  -9.0638638
 [2,]  3.343968  14.0795050 -0.7505034 -0.4568367  0.6535365 -3.364598  0.8739206 -11.2970760
 [3,] -2.945964  -0.7505034 22.5116468 -1.4260157  2.1421806 -1.692382  0.2008540   0.5534397
 [4,]  2.681326  -0.4568367 -1.4260157  5.5890772 -0.4571735 -1.865107  2.0444638  -6.1001166
 [5,] -2.874979   0.6535365  2.1421806 -0.4571735 11.3015172  4.868203  2.8007451   4.7500982
 [6,] -2.348193  -3.3645979 -1.6923816 -1.8651069  4.8682034 23.577790  1.0402273   5.9889160
 [7,] -2.489377   0.8739206  0.2008540  2.0444638  2.8007451  1.040227 14.9391214   4.2988728
 [8,] -9.063864 -11.2970760  0.5534397 -6.1001166  4.7500982  5.988916  4.2988728  41.5068145
 [9,]  2.403844   5.4485100 -6.5614202  3.0897043 -2.2061611 -3.847017  2.1177429 -12.9128668
[10,] -3.989275   0.6387864  2.0253973 -3.7070790  2.8747581 -9.162390 -1.6017010   9.2767591
            [,9]       [,10]
 [1,]   2.403844  -3.9892746
 [2,]   5.448510   0.6387864
 [3,]  -6.561420   2.0253973
 [4,]   3.089704  -3.7070790
 [5,]  -2.206161   2.8747581
 [6,]  -3.847017  -9.1623899
 [7,]   2.117743  -1.6017010
 [8,] -12.912867   9.2767591
 [9,]  15.467615  -2.2811199
[10,]  -2.281120 -18.2657697

It should return an array of zeroes.

Output above is from RStan 2.15.1, R version 3.4.0 (2017-04-21), on Linux 4.10.13-1-ARCH.

bob-carpenter commented 7 years ago

log_prob() is going to contain the Jacobian adjustment for the constraining transform for any constrained parameters and may also contain all the constants depending on how it's called, wheras optimizing won't contain the Jacobians or the constants. So I'm not sure the actual issue originally filed is really an issue.

The second issue brought up of the constraining and unconstraining transforms not matching is something that one of the RStan devs is going to have to look at.

schrimpf commented 7 years ago

My understanding from the documentation is that log_prob(,,adjust_transform=FALSE) should not contain the Jacobian. I was unaware of the constants, but I suspect they are not the source of the difference.

I came across this issue by attempting to initialize optimizing from parameters returned by a previous call to optimizing. I noticed that the initial log probability was very different from the value returned by the first call. For example,

> postmode1 <- optimizing(model, data = standat, as_vector = FALSE)                                    
> postmode1$value                                                                                      
 [1] -18389.39                                                                                          
> postmode3 <- optimizing(model, data = standat, as_vector = FALSE,                                    
 +                         iter=1, init=postmode1$par, verbose=TRUE)                                    
 Initial log joint probability = -136472                                                                
     Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes             
        1       -115838       46.7049       3226.84   0.0003292       0.001        3                    
 Optimization terminated normally:                                                                      
   Maximum number of iterations hit, may not be at an optima                                            
> log_prob(fit, unconstrain_pars(fit, postmode1$par),                                                  
 +          adjust_transform = FALSE)                                                                   
 [1] -136471.8                                                                                          

I expect the "Initial log join probability = -136472" to be -18389.39, but it is not. The call to log_prob() returns -13671.8, which happens to be the same as the initial log joint probability. I'm guessing that both have the same (lack of) constants, but both the second call to optimizing() and log_prob() are affected the by the constrain and unconstrain transforms not matching.

A possible work around for this issue is to modify the stan model to use cholesky_factor_cov as a parameter instead of cov_matrix. In my testing, cholesky_factor_cov is unaffected by the constrain / unconstrain problem.

bob-carpenter commented 7 years ago

Thanks --- that indeed sounds like a bug in something going on with transforms in R. @bgoodri --- can you verify?

The issue may be coming from the transforms in Stan or may be an issue with RStan.

That adjust_transform=FALSE looks like it should skip the Jacobian.

bob-carpenter commented 7 years ago

Also, for what it's worth, it's better to use Cholesky factors if you can---there's less transforming back and forth and they're much easier to solve when used as covariance parameters, so they're both more efficient and more arithmetically stable.

simpsonm commented 7 years ago

I just wanted to note that, as @schrimpf said, it's not the constants, because if it were then we would see a constant difference in my tables in the original post.

Also, +1 to what @bob-carpenter just said. I can't remember what exactly I was doing with the inverse wishart when I discovered this issue, but I do remember it was just supposed to be quick-and-dirty to check something!

bob-carpenter commented 7 years ago

As a sanity check, I verified that our covariance matrix constrain/free code is not broken in the Stan math lib.

  Eigen::MatrixXd L(3,3);
  L << 1, 0, 0,
    -2, 2, 0,
    -5, 8, 4;
  Eigen::MatrixXd Sigma = L * L.transpose();
  std::cout << "Sigma=" << std::endl << Sigma << std::endl;
  Eigen::VectorXd SigmaF = stan::math::cov_matrix_free(Sigma);
  Eigen::MatrixXd SigmaRT = stan::math::cov_matrix_constrain(SigmaF, 3);
  std::cout << "SigmaRT=" << std::endl << SigmaRT << std::endl;

prints

Sigma=
  1  -2  -5
 -2   8  26
 -5  26 105
SigmaRT=
  1  -2  -5
 -2   8  26
 -5  26 105
bgoodri commented 7 years ago

Anyone have any idea on this bug? The unconstrain_pars function just calls the transform_inits method https://github.com/stan-dev/rstan/blob/develop/rstan/rstan/inst/include/rstan/stan_fit.hpp#L1033 And I have verified that sigma is passed correctly via print statements. First, as a std::vector<double> inside unconstrain_pars

sigma = 
i = 0 0.885836
i = 1 -1.03357
i = 2 -1.14612
i = 3 1.12714
i = 4 1.48227
i = 5 1.69006
i = 6 1.23279
i = 7 1.50629
i = 8 0.758087
i = 9 -0.487205
i = 10 -1.03357
i = 11 4.03433
i = 12 4.04428
i = 13 -3.29228
i = 14 0.301641
i = 15 -3.86529
i = 16 -3.72179
i = 17 -4.80908
i = 18 -0.207935
i = 19 -1.54076
i = 20 -1.14612
i = 21 4.04428
i = 22 13.4446
i = 23 -6.26562
i = 24 5.25574
i = 25 -3.24293
i = 26 -1.05292
i = 27 -9.78028
i = 28 -1.2836
i = 29 -0.761396
i = 30 1.12714
i = 31 -3.29228
i = 32 -6.26562
i = 33 12.3402
i = 34 -0.614488
i = 35 8.42763
i = 36 -0.00286811
i = 37 6.64553
i = 38 6.36562
i = 39 -0.755601
i = 40 1.48227
i = 41 0.301641
i = 42 5.25574
i = 43 -0.614488
i = 44 7.36819
i = 45 3.14246
i = 46 1.49189
i = 47 -1.96
i = 48 2.16932
i = 49 -0.699594
i = 50 1.69006
i = 51 -3.86529
i = 52 -3.24293
i = 53 8.42763
i = 54 3.14246
i = 55 10.004
i = 56 3.33234
i = 57 6.98054
i = 58 5.16176
i = 59 3.04339
i = 60 1.23279
i = 61 -3.72179
i = 62 -1.05292
i = 63 -0.00286811
i = 64 1.49189
i = 65 3.33234
i = 66 8.02374
i = 67 5.13997
i = 68 -2.13051
i = 69 2.58619
i = 70 1.50629
i = 71 -4.80908
i = 72 -9.78028
i = 73 6.64553
i = 74 -1.96
i = 75 6.98054
i = 76 5.13997
i = 77 16.9369
i = 78 -2.7612
i = 79 6.71211
i = 80 0.758087
i = 81 -0.207935
i = 82 -1.2836
i = 83 6.36562
i = 84 2.16932
i = 85 5.16176
i = 86 -2.13051
i = 87 -2.7612
i = 88 9.38552
i = 89 -2.22441
i = 90 -0.487205
i = 91 -1.54076
i = 92 -0.761396
i = 93 -0.755601
i = 94 -0.699594
i = 95 3.04339
i = 96 2.58619
i = 97 6.71211
i = 98 -2.22441
i = 99 11.0157

and then inside transform_inits as the Eigen::MatrixXd

sigma = 
   0.885836    -1.03357    -1.14612     1.12714     1.48227     1.69006     1.23279     1.50629    0.758087   -0.487205
   -1.03357     4.03433     4.04428    -3.29228    0.301641    -3.86529    -3.72179    -4.80908   -0.207935    -1.54076
   -1.14612     4.04428     13.4446    -6.26562     5.25574    -3.24293    -1.05292    -9.78028     -1.2836   -0.761396
    1.12714    -3.29228    -6.26562     12.3402   -0.614488     8.42763 -0.00286811     6.64553     6.36562   -0.755601
    1.48227    0.301641     5.25574   -0.614488     7.36819     3.14246     1.49189       -1.96     2.16932   -0.699594
    1.69006    -3.86529    -3.24293     8.42763     3.14246      10.004     3.33234     6.98054     5.16176     3.04339
    1.23279    -3.72179    -1.05292 -0.00286811     1.49189     3.33234     8.02374     5.13997    -2.13051     2.58619
    1.50629    -4.80908    -9.78028     6.64553       -1.96     6.98054     5.13997     16.9369     -2.7612     6.71211
   0.758087   -0.207935     -1.2836     6.36562     2.16932     5.16176    -2.13051     -2.7612     9.38552    -2.22441
  -0.487205    -1.54076   -0.761396   -0.755601   -0.699594     3.04339     2.58619     6.71211    -2.22441     11.0157

which matches what is is in R

            [,1]       [,2]       [,3]         [,4]       [,5]      [,6]         [,7]      [,8]       [,9]      [,10]
 [1,]  0.8858360 -1.0335656 -1.1461179  1.127141203  1.4822674  1.690063  1.232787074  1.506286  0.7580874 -0.4872055
 [2,] -1.0335656  4.0343310  4.0442769 -3.292276194  0.3016413 -3.865287 -3.721791392 -4.809080 -0.2079354 -1.5407615
 [3,] -1.1461179  4.0442769 13.4445792 -6.265619532  5.2557425 -3.242935 -1.052920984 -9.780276 -1.2835982 -0.7613959
 [4,]  1.1271412 -3.2922762 -6.2656195 12.340175407 -0.6144879  8.427632 -0.002868115  6.645534  6.3656209 -0.7556006
 [5,]  1.4822674  0.3016413  5.2557425 -0.614487923  7.3681856  3.142462  1.491892481 -1.960000  2.1693191 -0.6995940
 [6,]  1.6900628 -3.8652867 -3.2429349  8.427631777  3.1424619 10.003985  3.332336509  6.980537  5.1617590  3.0433922
 [7,]  1.2327871 -3.7217914 -1.0529210 -0.002868115  1.4918925  3.332337  8.023737578  5.139975 -2.1305082  2.5861947
 [8,]  1.5062859 -4.8090802 -9.7802756  6.645533744 -1.9599999  6.980537  5.139974596 16.936855 -2.7611989  6.7121051
 [9,]  0.7580874 -0.2079354 -1.2835982  6.365620945  2.1693191  5.161759 -2.130508169 -2.761199  9.3855185 -2.2244146
[10,] -0.4872055 -1.5407615 -0.7613959 -0.755600554 -0.6995940  3.043392  2.586194688  6.712105 -2.2244146 11.0156642
bob-carpenter commented 7 years ago

Any idea how to replicate the issue in Stan itself or in CmdStan? I'm not going to be much use trying to track this down through RStan.

bgoodri commented 7 years ago

I think only RStan provides direct access to unconstrain and constrain methods. The stan_fit.hpp file linked previously is header only so you can edit it on disk. Not sure where it lives on a Mac but you can find it from R by executing: system.file ("include", "rstan", "stan_fit.hpp", package = "rstan")

On Jun 26, 2017 6:12 PM, "Bob Carpenter" notifications@github.com wrote:

Any idea how to replicate the issue in Stan itself or in CmdStan? I'm not going to be much use trying to track this down through RStan.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/383#issuecomment-311196841, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqtLP8El2waz8aJMlGLvQ5y-qPqlKks5sICy8gaJpZM4Lt-iD .

bgoodri commented 7 years ago

The problem seems to be that the cov_matrix_unconstrain method in src/stan/io/writer.hpp does not call the cov_matrix_free function in Stan Math and the two implementations differ.