rmcelreath / rethinking

Statistical Rethinking course and book package
2.1k stars 596 forks source link

Posterior Predictions from Multiple Equations #15

Closed jebyrnes closed 8 years ago

jebyrnes commented 8 years ago

So, I'm working with rethinking in a Structural Equation Modeling framework. It works beautifully, but, I'm having some issues in generating posterior predictions. Namely, effects of linked relationships don't seem to translate. Consider the following model

x1 -> y1 -> y2

where each coefficient is a slope of 2.

I can create a nice fake data frame

set.seed(31415)
ndf <- data.frame(x1=runif(100,0,100))
ndf$y1 <- rnorm(100, ndf$x1*2, 40)
ndf$y2 <- rnorm(100, ndf$y1*2, 80)

And then a model that fits nicely using map2stan

nf <- alist(
  y1 ~ dnorm(yhat1, sigma1),
  y2 ~ dnorm(yhat2, sigma2),

  yhat1 <-  b1*x1,
  yhat2 <- b2*y1,

  b1 ~ dnorm( 0 , 1000 ),
  b2 ~ dnorm( 0 , 1000 ),

  sigma1 ~ dcauchy( 0 , 1 ),
  sigma2 ~ dcauchy( 0 , 1 )

)

n.fit <- map2stan( 
  nf , 
  data=ndf , 
  start=list(a1=1, a2=1, b1=1, b2=1, sigma1=10, sigma2=10)
)

here

> coef(n.fit)
        b1         b2     sigma1     sigma2 
  1.930119   1.965780  37.934121  86.538586 

Great! But, when I try and do posterior prediction with x starting as 3, which should produce roughly 12 for y2...

> link(n.fit, data=list(x1=3)) -> a
[ 1000 / 1000 ] 200 / 1000 ]

> sapply(a, mean)
   yhat1    yhat2 
5.790357 1.965780 

`

Huh....why is yhat incorrect here?

rmcelreath commented 8 years ago

The function link() just substitutes values into any linear models in the model definition. You provided a value for x1, but not for y1. It isn't going to propagate the prediction yhat1 into the linear model for yhat2, because the model isn't written that way.

But you can just put a dummy linear model in the model definition to get what you are after.

nf <- alist(
  y1 ~ dnorm(yhat1, sigma1),
  y2 ~ dnorm(yhat2, sigma2),

  yhat2pred <- b2*yhat1,
  yhat1 <- b1*x1,
  yhat2 <- b2*y1,

  b1 ~ dnorm( 0 , 10 ),
  b2 ~ dnorm( 0 , 10 ),

  sigma1 ~ dcauchy( 0 , 1 ),
  sigma2 ~ dcauchy( 0 , 1 )

)

n.fit <- map2stan( 
  nf , 
  data=ndf , 
  start=list(b1=1, b2=1, sigma1=10, sigma2=10)
)

a <- link(n.fit, data=list(x1=3))
sapply(a,mean)

The output should be:

yhat2pred     yhat1     yhat2 
13.027757  6.586785  1.977875 

I think the yhat2 value is coming from default values for y1 that are used, since you didn't pass in a value to link.