stan-dev / example-models

Example models for Stan
http://mc-stan.org/
771 stars 479 forks source link

Request for factorial ANOVA with generated means and sds #61

Open wdonald1985 opened 8 years ago

wdonald1985 commented 8 years ago

Hello: I very much want to start using RSTAN for all of my analyses and have thus far figured out how to implement very basic models. My field commonly uses ANOVA, however, where there are interactions between categorical variables. I was hoping someone could share with me code for a factorial ANOVA (3 x 2, for example), with generated quantities for all means and also estimates for the standard deviations. I have attempted to translate Kruschke's ANOVA, which was implemented in RJAGS, to RSTAN but have run into difficulties.

bob-carpenter commented 8 years ago

Is there a pointer to the JAGS model you're trying to translate?

Or a definition of the prior and likelihood?

wdonald1985 commented 8 years ago

Here is the model. When using RSTAN, I can get a linear model with factors to run properly. What I cant get, however, is estimates for all means through using the coefficients to generate quantities. Additionally, this model generates standard deviations for each group. I think the problem is in k-1 coding systems for categorical variables.

modelstring = " model { for ( i in 1:Ntotal ) { y[i] ~ dt( mu[i] , 1/(ySigma[x1[i],x2[i]])^2 , nu ) mu[i] <- a0 + a1[x1[i]] + a2[x2[i]] + a1a2[x1[i],x2[i]] }

nu ~ dgamma(5.83,0.0483) # mode 100, sd 50

nu ~ dexp(1/30.0) 
#
for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) {
  sigma[j1,j2] ~ dgamma( sigmaSh , sigmaRa )
  # Prevent from dropping too close to zero:
  ySigma[j1,j2] <- max( sigma[j1,j2] , medianCellSD/1000 )
} }
sigmaSh <- 1 + sigmaMode \* sigmaRa
sigmaRa <- ( sigmaMode + sqrt( sigmaMode^2 + 4_sigmaSD^2 ) ) /(2_sigmaSD^2)
sigmaMode ~ dgamma(sGammaShRa[1],sGammaShRa[2]) 
sigmaSD ~ dgamma(sGammaShRa[1],sGammaShRa[2]) 
#
a0 ~ dnorm( yMean , 1/(ySD*5)^2 ) 
#
for ( j1 in 1:Nx1Lvl ) { a1[j1] ~ dnorm( 0.0 , 1/a1SD^2 ) }
a1SD ~ dgamma(aGammaShRa[1],aGammaShRa[2]) 
#
for ( j2 in 1:Nx2Lvl ) { a2[j2] ~ dnorm( 0.0 , 1/a2SD^2 ) }
a2SD ~ dgamma(aGammaShRa[1],aGammaShRa[2]) 
#
for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) {
  a1a2[j1,j2] ~ dnorm( 0.0 , 1/a1a2SD^2 )
} }
a1a2SD ~ dgamma(aGammaShRa[1],aGammaShRa[2]) # or try a folded t (Cauchy)
# Convert a0,a1[],a2[],a1a2[,] to sum-to-zero b0,b1[],b2[],b1b2[,] :
for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) {
  m[j1,j2] <- a0 + a1[j1] + a2[j2] + a1a2[j1,j2] # cell means 
} }
b0 <- mean( m[1:Nx1Lvl,1:Nx2Lvl] )
for ( j1 in 1:Nx1Lvl ) { b1[j1] <- mean( m[j1,1:Nx2Lvl] ) - b0 }
for ( j2 in 1:Nx2Lvl ) { b2[j2] <- mean( m[1:Nx1Lvl,j2] ) - b0 }
for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) {
  b1b2[j1,j2] <- m[j1,j2] - ( b0 + b1[j1] + b2[j2] )  
} }

} (Kruschke, 2015)

andrewgelman commented 8 years ago

Hi, you should include the Stan model, not the Bugs model! A

On Apr 9, 2016, at 6:19 PM, wdonald1985 notifications@github.com wrote:

Here is the model. When using RSTAN, I can get a linear model with factors to run properly. What I cant get, however, is estimates for all means through using the coefficients to generate quantities. Additionally, this model generates standard deviations for each group. I think the problem is in k-1 coding systems for categorical variables.

modelstring = " model { for ( i in 1:Ntotal ) { y[i] ~ dt( mu[i] , 1/(ySigma[x1[i],x2[i]])^2 , nu ) mu[i] <- a0 + a1[x1[i]] + a2[x2[i]] + a1a2[x1[i],x2[i]] }

nu ~ dgamma(5.83,0.0483) # mode 100, sd 50

nu ~ dexp(1/30.0) # for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { sigma[j1,j2] ~ dgamma( sigmaSh , sigmaRa )

Prevent from dropping too close to zero:

ySigma[j1,j2] <- max( sigma[j1,j2] , medianCellSD/1000 ) } } sigmaSh <- 1 + sigmaMode * sigmaRa sigmaRa <- ( sigmaMode + sqrt( sigmaMode^2 + 4sigmaSD^2 ) ) /(2sigmaSD^2) sigmaMode ~ dgamma(sGammaShRa[1],sGammaShRa[2]) sigmaSD ~ dgamma(sGammaShRa[1],sGammaShRa[2]) # a0 ~ dnorm( yMean , 1/(ySD*5)^2 ) # for ( j1 in 1:Nx1Lvl ) { a1[j1] ~ dnorm( 0.0 , 1/a1SD^2 ) } a1SD ~ dgamma(aGammaShRa[1],aGammaShRa[2]) # for ( j2 in 1:Nx2Lvl ) { a2[j2] ~ dnorm( 0.0 , 1/a2SD^2 ) } a2SD ~ dgamma(aGammaShRa[1],aGammaShRa[2]) # for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { a1a2[j1,j2] ~ dnorm( 0.0 , 1/a1a2SD^2 ) } } a1a2SD ~ dgamma(aGammaShRa[1],aGammaShRa[2]) # or try a folded t (Cauchy)

Convert a0,a1[],a2[],a1a2[,] to sum-to-zero b0,b1[],b2[],b1b2[,] :

for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { m[j1,j2] <- a0 + a1[j1] + a2[j2] + a1a2[j1,j2] # cell means } } b0 <- mean( m[1:Nx1Lvl,1:Nx2Lvl] ) for ( j1 in 1:Nx1Lvl ) { b1[j1] <- mean( m[j1,1:Nx2Lvl] ) - b0 } for ( j2 in 1:Nx2Lvl ) { b2[j2] <- mean( m[1:Nx1Lvl,j2] ) - b0 } for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { b1b2[j1,j2] <- m[j1,j2] - ( b0 + b1[j1] + b2[j2] )

} } } (Kruschke, 2015)

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/stan-dev/example-models/issues/61#issuecomment-207865464

bob-carpenter commented 8 years ago

Is that the right attached program? I don't see any categorical variables.

Consistent indentation on code makes it much easier to read. I may just be missing something.

On Apr 9, 2016, at 6:19 PM, wdonald1985 notifications@github.com wrote:

Here is the model. When using RSTAN, I can get a linear model with factors to run properly. What I cant get, however, is estimates for all means through using the coefficients to generate quantities. Additionally, this model generates standard deviations for each group. I think the problem is in k-1 coding systems for categorical variables.

modelstring = " model { for ( i in 1:Ntotal ) { y[i] ~ dt( mu[i] , 1/(ySigma[x1[i],x2[i]])^2 , nu ) mu[i] <- a0 + a1[x1[i]] + a2[x2[i]] + a1a2[x1[i],x2[i]] }

nu ~ dgamma(5.83,0.0483) # mode 100, sd 50

nu ~ dexp(1/30.0) # for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { sigma[j1,j2] ~ dgamma( sigmaSh , sigmaRa )

Prevent from dropping too close to zero:

ySigma[j1,j2] <- max( sigma[j1,j2] , medianCellSD/1000 ) } } sigmaSh <- 1 + sigmaMode * sigmaRa sigmaRa <- ( sigmaMode + sqrt( sigmaMode^2 + 4sigmaSD^2 ) ) /(2sigmaSD^2) sigmaMode ~ dgamma(sGammaShRa[1],sGammaShRa[2]) sigmaSD ~ dgamma(sGammaShRa[1],sGammaShRa[2]) # a0 ~ dnorm( yMean , 1/(ySD*5)^2 ) # for ( j1 in 1:Nx1Lvl ) { a1[j1] ~ dnorm( 0.0 , 1/a1SD^2 ) } a1SD ~ dgamma(aGammaShRa[1],aGammaShRa[2]) # for ( j2 in 1:Nx2Lvl ) { a2[j2] ~ dnorm( 0.0 , 1/a2SD^2 ) } a2SD ~ dgamma(aGammaShRa[1],aGammaShRa[2]) # for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { a1a2[j1,j2] ~ dnorm( 0.0 , 1/a1a2SD^2 ) } } a1a2SD ~ dgamma(aGammaShRa[1],aGammaShRa[2]) # or try a folded t (Cauchy)

Convert a0,a1[],a2[],a1a2[,] to sum-to-zero b0,b1[],b2[],b1b2[,] :

for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { m[j1,j2] <- a0 + a1[j1] + a2[j2] + a1a2[j1,j2] # cell means } } b0 <- mean( m[1:Nx1Lvl,1:Nx2Lvl] ) for ( j1 in 1:Nx1Lvl ) { b1[j1] <- mean( m[j1,1:Nx2Lvl] ) - b0 } for ( j2 in 1:Nx2Lvl ) { b2[j2] <- mean( m[1:Nx1Lvl,j2] ) - b0 } for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { b1b2[j1,j2] <- m[j1,j2] - ( b0 + b1[j1] + b2[j2] )

} } } (Kruschke, 2015)

— You are receiving this because you commented. Reply to this email directly or view it on GitHub

wdonald1985 commented 8 years ago

Through trial and many errors my RSTAN model is not looking very good. I'll post it after I clean it up a bit. Thanks!

bob-carpenter commented 8 years ago

Was there a JAGS model you're trying to translate? The one you sent didn't have any categorical variables I could find.

P.S. I'd strongly recommend version control to recover clean old versions of things. Where possible, I'd also recommend starting with the simplest thing that works and building out rather than trying to build a complicated model all at once.

On Apr 10, 2016, at 1:38 PM, wdonald1985 notifications@github.com wrote:

Through trial and many errors my RSTAN model is not looking very good. I'll post it after I clean it up a bit. Thanks!

— You are receiving this because you commented. Reply to this email directly or view it on GitHub

wdonald1985 commented 8 years ago

Hello: Apologies on taking so long to provide the script to my model. Since you are both very well know statisticians, I wanted to ensure my code was up to par. This is a 3 * 2 interaction with categorical predictors. Since I am only getting one estimate for sigma, I think my model is assuming homogeneous variances. I want a model that does not make this assumption, thereby providing 6 estimates for sigma. Additionally, I would like a way to get the lower level estimates, for example, estimates for VS and GEAR without having to build another model.

myData <- mtcars X <- model.matrix(~ vs * gear, data = myData) K <- 6 // number of columns in matrix N <- length(mtcars$mpg) standat= list(X = X,K =K, N =N, y =y) stanmodelcode = ' data { int N;
vector[N] y;
int K;
matrix [N, K]X ;
} parameters { real sigma; vector [K] beta; } transformed parameters { } model { y ~ normal(X * beta, sigma); } generated quantities { real mu0_3; real mu1_3; real mu0_4; real mu1_4; real mu0_5; real mu1_5; mu0_3 <- beta[1]; mu1_3 <- beta[1] + beta[2]; mu0_4 <- beta[1] + beta[3]; mu1_4 <- beta[1] + beta[2] + beta[3] + beta[5]; mu0_5 <- beta[1] + beta [4]; mu1_5 <- beta[1] + beta[2] + beta[4] + beta[6]; } ' fit = stan(model_code=stanmodelcode, data=standat, iter=1200, warmup=10, seed = 1, chains=2, thin=2)