rmcelreath / glmer2stan

Define Stan models using glmer-style (lme4) formulas
55 stars 11 forks source link

Disappearing Random Effects (w/ fix) #6

Open cgonza12 opened 9 years ago

cgonza12 commented 9 years ago

Issue: Disappearing Random effects

While I was using glmer2stan to run my dissertation I noticed that some of my more complex models woud have myseriously disappearing random effects. I've replicated the bug below and my solution. Not sure if anyone else has experienced this, but I figured its worth documenting

library(lme4)
data(sleepstudy)
sleepstudy$Days2 = rnorm(length(sleepstudy$Days)) # just noise
sleepstudy$Days3 = rnorm(length(sleepstudy$Days)) # just noise
sleepstudy$Days4 = rnorm(length(sleepstudy$Days)) # just noise

sleepstudy$subject_index <- as.integer(as.factor(sleepstudy$Subject)) 

# basic MCMC parameters
niter = 500
nwarm = 100
chains = 4

#glmer2stan
example1 <- lmer2stan( Reaction ~ Days+Days2+Days3+Days4
                                 +Days:Days2
                                 +Days:Days2
                                 +Days:Days2:Days3:Days4
                       + (1 | subject_index), data=sleepstudy,
                     calcWAIC=T,
                     warmup=nwarm, #burn-in period
                     iter = niter, # number of steps per chain
                     chains=chains,sample=F) #number of chains
cat(example1$model)

The above code seems to shit the bed when there's a mix of high and low level interactions specified with ':' instead of fully with '*'. Specifically when this mix is present, the model does not contain any random effects (but will run just fine which is scary).

data{
    int N;
    real Reaction[N];
    real Days[N];
    real Days2[N];
    real Days3[N];
    real Days4[N];
    real subject_index[N]; //okay the index is obviously in the data...
    real Days_X_Days2[N];
    real Days_X_Days2_X_Days3_X_Days4[N];
}

parameters{
    real Intercept;
    real beta_Days;
    real beta_Days2;
    real beta_Days3;
    real beta_Days4;
    real beta_Days_X_Days2;
    real beta_Days_X_Days2_X_Days3_X_Days4;
    real<lower=0> sigma;
}

model{
    real vary[N];
    real glm[N];
    // Priors
    Intercept ~ normal( 0 , 100 );
    beta_Days ~ normal( 0 , 100 );
    beta_Days2 ~ normal( 0 , 100 );
    beta_Days3 ~ normal( 0 , 100 );
    beta_Days4 ~ normal( 0 , 100 );
    beta_Days_X_Days2 ~ normal( 0 , 100 );
    beta_Days_X_Days2_X_Days3_X_Days4 ~ normal( 0 , 100 );
    sigma ~ uniform( 0 , 100 );
    // WTF where are my random effects??

    // Fixed effects
    for ( i in 1:N ) {
        glm[i] <- Intercept
                + beta_Days * Days[i]
                + beta_Days2 * Days2[i]
                + beta_Days3 * Days3[i]
                + beta_Days4 * Days4[i]
                + beta_Days_X_Days2 * Days_X_Days2[i]
                + beta_Days_X_Days2_X_Days3_X_Days4 * Days_X_Days2_X_Days3_X_Days4[i];
    }
    Reaction ~ normal( glm , sigma );
}

generated quantities{
    real dev;
    real vary[N];
    real glm[N];
    dev <- 0;
    for ( i in 1:N ) {
        glm[i] <- Intercept
                + beta_Days * Days[i]
                + beta_Days2 * Days2[i]
                + beta_Days3 * Days3[i]
                + beta_Days4 * Days4[i]
                + beta_Days_X_Days2 * Days_X_Days2[i]
                + beta_Days_X_Days2_X_Days3_X_Days4 * Days_X_Days2_X_Days3_X_Days4[i];
        dev <- dev + (-2) * normal_log( Reaction[i] , glm[i] , sigma );
    }
}

The source of the issue is the parse_formula function. basically, lines 66 - 69

if (formula == nobars(formula)) {
     ranef <- list()
   }

kill the random effects for some reason. I understand they are there in case the user is actually running a model with no random effects, e.g. a bivariate regression or something simple, but for some reason the inclusion of a mix of ':' throws things off.

In order to run my analyses, which had a mix of high and low-level interactions I tweaked your code slightly and wrote myglmer2stan. I included a function arguement 'Ranef' which specifies explicitly wehter or not there are random effects present. If there are, parse formula is declared without lines 66-69. If there are no randome effects than the existing parse formula function is specified. my function can be found here.

example2 <- myglmer2stan( Reaction ~ Days+Days2+Days3+Days4
                       +Days:Days2
                       +Days:Days2
                       +Days:Days2:Days3:Days4
                       + (1 | subject_index), data=sleepstudy,
                       calcWAIC=T,
                       warmup=nwarm, #burn-in period
                       iter = niter, # number of steps per chain
                       chains=chains,sample=F) #number of chains
cat(example2$model)

no more issue!

data{
    int N;
    real Reaction[N];
    real Days[N];
    real Days2[N];
    real Days3[N];
    real Days4[N];
    int subject_index[N];
    real Days_X_Days2[N];
    real Days_X_Days2_X_Days3_X_Days4[N];
    int N_subject_index;
}

parameters{
    real Intercept;
    real beta_Days;
    real beta_Days2;
    real beta_Days3;
    real beta_Days4;
    real beta_Days_X_Days2;
    real beta_Days_X_Days2_X_Days3_X_Days4;
    real<lower=0> sigma;
    real vary_subject_index[N_subject_index];
    real<lower=0> sigma_subject_index;
}

model{
    real vary[N];
    real glm[N];
    // Priors
    Intercept ~ normal( 0 , 100 );
    beta_Days ~ normal( 0 , 100 );
    beta_Days2 ~ normal( 0 , 100 );
    beta_Days3 ~ normal( 0 , 100 );
    beta_Days4 ~ normal( 0 , 100 );
    beta_Days_X_Days2 ~ normal( 0 , 100 );
    beta_Days_X_Days2_X_Days3_X_Days4 ~ normal( 0 , 100 );
    sigma_subject_index ~ uniform( 0 , 100 );
    sigma ~ uniform( 0 , 100 );
    // Varying effects THEYRE STILL HERE!!
    for ( j in 1:N_subject_index ) vary_subject_index[j] ~ normal( 0 , sigma_subject_index );
    // Fixed effects
    for ( i in 1:N ) {
        vary[i] <- vary_subject_index[subject_index[i]];
        glm[i] <- vary[i] + Intercept
                + beta_Days * Days[i]
                + beta_Days2 * Days2[i]
                + beta_Days3 * Days3[i]
                + beta_Days4 * Days4[i]
                + beta_Days_X_Days2 * Days_X_Days2[i]
                + beta_Days_X_Days2_X_Days3_X_Days4 * Days_X_Days2_X_Days3_X_Days4[i];
    }
    Reaction ~ normal( glm , sigma );
}

generated quantities{
    real dev;
    real vary[N];
    real glm[N];
    dev <- 0;
    for ( i in 1:N ) {
        vary[i] <- vary_subject_index[subject_index[i]];
        glm[i] <- vary[i] + Intercept
                + beta_Days * Days[i]
                + beta_Days2 * Days2[i]
                + beta_Days3 * Days3[i]
                + beta_Days4 * Days4[i]
                + beta_Days_X_Days2 * Days_X_Days2[i]
                + beta_Days_X_Days2_X_Days3_X_Days4 * Days_X_Days2_X_Days3_X_Days4[i];
        dev <- dev + (-2) * normal_log( Reaction[i] , glm[i] , sigma );
    }
}
rmcelreath commented 9 years ago

Thanks for the example. This package is low priority for me right now, but I have a colleague who is thinking of rewriting the parsing, which would likely fix these issues (as well as initially introduce new ones, no doubt).

Something that sometimes fixes it is to put the random effects first in the formula: (1|group) + x instead of x + (1|group).

cgonza12 commented 9 years ago

Thanks Richard, I saw that you're thinking of discontinuing the package (it worked really well for me!). Never tried the reordering...looking forward to whatever's coming next!

Christian

On Tue, May 12, 2015 at 11:08 AM, Richard McElreath < notifications@github.com> wrote:

Thanks for the example. This package is low priority for me right now, but I have a colleague who is thinking of rewriting the parsing, which would likely fix these issues (as well as initially introduce new ones, no doubt).

Something that sometimes fixes it is to put the random effects first in the formula: (1|group) + x instead of x + (1|group).

— Reply to this email directly or view it on GitHub https://github.com/rmcelreath/glmer2stan/issues/6#issuecomment-101312510 .

stevencarlislewalker commented 9 years ago

I've never looked in detail at glmer2stan but from an lme4 perspective, these lines look reasonable:

if (formula == nobars(formula)) {
     ranef <- list()
   }

The point of nobars is to remove random effects terms from a mixed model formula. Therefore, in this example, random effects are only 'killed' if there are no random effects in the formula, which seems reasonable. From this I might guess that the real problem is actually upstream of these lines, perhaps another call to nobars?