Rank deficiency handling in rstanarm

felixthoemmes commented 5 years ago

Dear Russell, Hope you are having a great summer. Maybe my following question should be addressed to the authors of Rstan / rstanarm, but I will post it regardless, in case it is helpful.

I noticed that a rank-deficient model (in which lm, lmer, or stan_lmer drop terms automatically), works properly in emmeans for lm and lmer models, but not stan_lmer models.

Below is a reproducible example:

#demonstrate the rank-deficient model works fine in emmeans 
## when estimated with lm, lmer, but not stan_lmer

library(lme4)
library(rstanarm)
library(emmeans)

id <- rep(1:10,each=10)

x <- data.frame(id = factor(sample(10, 100, replace = TRUE)))
x$y <- rnorm(nrow(x))
x$x1 <- 1
x$x2 <- ifelse(x$y<0, rnorm(nrow(x), mean=1), rnorm(nrow(x), mean=-1))
m1 <- lm(y ~ x1 + x2, data=x)
m2 <- lmer(y ~ x1 + x2 + (1 | id), data=x)
m3 <- stan_lmer(y ~ x1 + x2 + (1 | id), data=x)

predict(m1)
predict(m2)
posterior_predict(m3)

emmeans(m1,"x1") #works
emmeans(m2,"x1") #works
emmeans(m3,"x1") #does not work, gives errors about rank deficiency not handled as expected

This is obviously a tiny issue, as in most cases, users won't have rank-deficient models. The only reason that I stumbled over it was that I on purpose created a rank-deficient model, because I was too lazy to omit predictors.

Best, Felix

rvlenth commented 5 years ago

I will take a look at this, but the key words are "not handled as expected." I have experience with some packages that sort of sweep rank deficiencies under the rug in a way that it is difficult/impossible to tell which predictors were thrown out. In those cases, we get a different number of columns in the model matrix than is expected, and we can't decide whichn columns to keep or to do the needed estimability computations. In summary, "throwing a predictor out" and "setting a coefficient equal to zero" have the same numerical result in the regression equation, but my setup requires the latter in order to work correctly.

rvlenth commented 5 years ago

It has been a very long time for me to address this, but I now have it working, I think:

> emmeans(m3, "x1", at = list(x1 = c(.5,1,2)))
  x1 emmean lower.HPD upper.HPD
 0.5 nonEst        NA        NA
 1.0 0.0628     -0.12     0.233
 2.0 nonEst        NA        NA

HPD interval probability: 0.95

I needed to go back and actually figure out how I wanted to support non-estimability issues for Bayesian models (which I had neglected to consider for some reason -- well, the reason was laziness). Then I had to take care of some technical details -- not the least of which that the standard tools in coda don't accommodate a sample of all NAs

felixthoemmes commented 5 years ago

Thanks so much for looking into this, Russell. Your continued support and care for the emmeans package is greatly appreciated. I wouldn't know how to teach my grad classes without it!

On Fri, Aug 2, 2019, 2:57 AM Russell V. Lenth notifications@github.com wrote:

It has been a very long time for me to address this, but I now have it working, I think:

emmeans(m3, "x1", at = list(x1 = c(.5,1,2))) x1 emmean lower.HPD upper.HPD 0.5 nonEst NA NA 1.0 0.0628 -0.12 0.233 2.0 nonEst NA NA HPD interval probability: 0.95

I needed to go back and actually figure out how I wanted to support non-estimability issues for Bayesian models (which I had neglected to consider for some reason -- well, the reason was laziness). Then I had to take care of some technical details -- not the least of which that the standard tools in coda don't accommodate a sample of all NAs

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rvlenth/emmeans/issues/114?email_source=notifications&email_token=AECIYYWPDU2YU36PQ6JLF6DQCOH75A5CNFSM4HY3KE62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3MLGPA#issuecomment-517518140, or mute the thread https://github.com/notifications/unsubscribe-auth/AECIYYT5QO5CSMLHLE52PWTQCOH75ANCNFSM4HY3KE6Q .

rvlenth commented 5 years ago

Thanks. I'm closing as I think this is resolved. Let me know though if you see further issues or have a request

rvlenth / emmeans

Rank deficiency handling in rstanarm #114