rmcelreath / rethinking

Statistical Rethinking course and book package
2.14k stars 600 forks source link

Covariance Matrix not Fitting Correctly in Varying Slopes Model #34

Open mrdevlar opened 8 years ago

mrdevlar commented 8 years ago

Hi!

I've been running the code in chapter 13 of your book and currently the code does not extract out the correct Rho value after fitting the model.

The code in question is the following:


## R code 13.1
a <- 3.5            # average morning wait time
b <- (-1)           # average difference afternoon wait time
sigma_a <- 1        # std dev in intercepts
sigma_b <- 0.5      # std dev in slopes
rho <- (-0.7)       # correlation between intercepts and slopes

## R code 13.2
Mu <- c( a , b )

## R code 13.3
cov_ab <- sigma_a*sigma_b*rho
Sigma <- matrix( c(sigma_a^2,cov_ab,cov_ab,sigma_b^2) , ncol=2 )

## R code 13.4
matrix( c(1,2,3,4) , nrow=2 , ncol=2 )

## R code 13.5
sigmas <- c(sigma_a,sigma_b) # standard deviations
Rho <- matrix( c(1,rho,rho,1) , nrow=2 ) # correlation matrix

# now matrix multiply to get covariance matrix
Sigma <- diag(sigmas) %*% Rho %*% diag(sigmas)

## R code 13.6
N_cafes <- 20

## R code 13.7
library(MASS)
set.seed(5) # used to replicate example
vary_effects <- mvrnorm( N_cafes , Mu , Sigma )

## R code 13.8
a_cafe <- vary_effects[,1]
b_cafe <- vary_effects[,2]

## R code 13.9
plot( a_cafe , b_cafe , col=rangi2 ,
    xlab="intercepts (a_cafe)" , ylab="slopes (b_cafe)" )

# overlay population distribution
library(ellipse)
for ( l in c(0.1,0.3,0.5,0.8,0.99) )
    lines(ellipse(Sigma,centre=Mu,level=l),col=col.alpha("black",0.2))

## R code 13.10
N_visits <- 10
afternoon <- rep(0:1,N_visits*N_cafes/2)
cafe_id <- rep( 1:N_cafes , each=N_visits )
mu <- a_cafe[cafe_id] + b_cafe[cafe_id]*afternoon
sigma <- 0.5  # std dev within cafes
wait <- rnorm( N_visits*N_cafes , mu , sigma )
d <- data.frame( cafe=cafe_id , afternoon=afternoon , wait=wait )

## R code 13.11
R <- rlkjcorr( 1e4 , K=2 , eta=2 )
dens( R[,1,2] , xlab="correlation" )

## R code 13.12
m13.1 <- map2stan(
    alist(
        wait ~ dnorm( mu , sigma ),
        mu <- a_cafe[cafe] + b_cafe[cafe]*afternoon,
        c(a_cafe,b_cafe)[cafe] ~ dmvnorm2(c(a,b),sigma_cafe,Rho),
        a ~ dnorm(0,10),
        b ~ dnorm(0,10),
        sigma_cafe ~ dcauchy(0,2),
        sigma ~ dcauchy(0,2),
        Rho ~ dlkjcorr(2)
    ) ,
    data=d ,
    iter=5000 , warmup=2000 , chains=2 )

## R code 13.13
post <- extract.samples(m13.1)
dens( post$Rho[,1,2] )

You do not get a plot similar to that in Figure 13.4 when running your code.

You instead get the following: 5000 iter

If you run 1000000 reps with 4000 warmup you get the following, which doesn't include the big spike on zero:

100000 iter

Either way, it is centered at -0.19, not at -0.7 at it should be by design.

There is also a few errors that pop up on each chain, besides the one addressed by the overthinking box in the chapter:

[1] "The following numerical problems occured the indicated number of times after warmup on chain 2"
                                                                                                     count
validate transformed params: SRS_sigma_cafeRho is not symmetric. SRS_sigma_cafeRho[1,2] = inf, but S     1
[1] "When a numerical problem occurs, the Metropolis proposal gets rejected."
[1] "However, by design Metropolis proposals sometimes get rejected  even when there are no numerical problems."
[1] "Thus, if the number in the 'count' column is small,  do not ask about this message on stan-users."

Anyway, I am not sure what is going on, but from the looks of things, the sampling seems to be getting stuck on 0 in covariance estimation, any ideas on how to fix this?

Thanks!

rmcelreath commented 8 years ago

Seems like Stan changed something about how warmup and/or sampling works. If you set control=list(adapt_delta=0.99), the spike goes away with modest samples. So that suggests a change to warmup algorithm.

I don't understand why the results should differ, however, once the adaptation is adjusted.

I can't get the same random data from the provided seed, in any event. I can't imagine anything changed with the way the seed is used. So I must have used a different seed for the example shown in 13.4 and 13.5. I'll see if I figure it out.

mrdevlar commented 8 years ago

Thanks the adaptive_delta seems to have eliminated the spike. I initially expected that the mass at 0 was shifting the distribution's MAP, but it seems with the adaptive_delta it is still at:

               Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
Rho[1,1]       1.00   0.00       1.00       1.00  6000  NaN
Rho[1,2]      -0.18   0.33      -0.71       0.33  2211 1.00
Rho[2,1]      -0.18   0.33      -0.71       0.33  2211 1.00
Rho[2,2]       1.00   0.00       1.00       1.00  6000 1.00

Rather bizarre. Just to make sure I am understanding this correctly, I should expect Rho[1,2], Rho[2,1] to be -0.7 yes? As the original Rho matrix we defined earlier:

sigmas <- c(sigma_a,sigma_b) # standard deviations
Rho <- matrix( c(1,rho,rho,1) , nrow=2 ) # correlation matrix

Returns:

     [,1] [,2]
[1,]  1.0 -0.7
[2,] -0.7  1.0

Right? Or am I missing something?

rmcelreath commented 8 years ago

I think the provided seed in the text won't reproduce Figure 13.4. That's an issue. I should figure out which random seed will reproduce that figure.

But we shouldn't expect any particular simulation to recover the "true" correlation of -0.7. Many simulated sets of data will produce a milder empirical correlation than that, and then the finite evidence will lead to regularization towards a less strong correlation.

paul-buerkner commented 8 years ago

I think the Stan team recently changed the default of adapt_delta from 0.95 to 0.8, which may explain the convergence problems. Apart from that, adapt_delta = 0.8 should work ok when using the non-centered parameterizations.

When trying to get reliable estimates of random effects correlations, n = 20 (for the cafes) will likely not suffice. Also lkjcorr(2) pulls the estimates somewhat towards zero. As long as rho = -0.7 is in the 95% credible interval (when using Rho ~ lkjcorr(1)) I wouldn't worry too much.

lexmart commented 3 years ago

set.seed(4) instead of set.seed(5) gives me roughly the same plot as figure 13.4 (1st edition)