rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
341 stars 30 forks source link

Cannot produce estimates from GLM fit with brms when quadratic term is included in predictors #432

Closed qdread closed 11 months ago

qdread commented 1 year ago

Here's yet another issue for you! I want to produce emmeans comparing several levels of a treatment at fixed levels of a continuous covariate. In the model, both the first-order and second-order terms of the covariate interact with the treatment.

I get the expected result from emmeans() for a model of class glm fit with glm(), but not if a similar model is fit with brm(), with class brmsfit. I get the error Error in poly(variable, 2) : 'degree' must be less than number of unique points

I'm not sure what is going on under the hood there. Any help would be greatly appreciated.

Reproducible example

Here's a quick example fitting a very inappropriate model to mtcars.

library(emmeans)
library(brms)

mtcars$cyl <- factor(mtcars$cyl)

glmfit <- glm(vs ~ cyl * poly(hp, 2), data = mtcars, family = binomial)

emmeans(glmfit, ~ cyl | hp, at = list(hp = c(60, 90, 120)), type = 'response')

brmfit <- brm(vs ~ cyl * poly(hp, 2), data = mtcars, family = binomial,
              chains = 1, iter = 200, warmup = 100)

emmeans(brmfit, ~ cyl | hp, at = list(hp = c(60, 90, 120)), type = 'response')

As mentioned above, emmeans(glmfit) works as expected but emmeans(brmfit) returns an error in poly(hp,2).

rvlenth commented 1 year ago

brms has its own emmeans support. What I see is:

> attr(recover_data(brmfit), "terms")   # runs brms:::recover_data()
~++1 + cyl + poly(hp, 2) + cyl:poly(hp, 2) + cyl + hp + cyl + 
    hp
attr(,"variables")
list(cyl, poly(hp, 2), hp)
attr(,"factors")
            cyl poly(hp, 2) hp cyl:poly(hp, 2)
cyl           1           0  0               1
poly(hp, 2)   0           1  0               1
hp            0           0  1               0
attr(,"term.labels")
[1] "cyl"             "poly(hp, 2)"     "hp"              "cyl:poly(hp, 2)"
attr(,"order")
[1] 1 1 1 2
attr(,"intercept")
[1] 1
attr(,"response")
[1] 0
attr(,".Environment")
<environment: R_GlobalEnv>

compare this with

> attr(recover_data(glmfit), "terms")
~cyl * poly(hp, 2)
attr(,"variables")
list(cyl, poly(hp, 2))
attr(,"factors")
            cyl poly(hp, 2) cyl:poly(hp, 2)
cyl           1           0               1
poly(hp, 2)   0           1               1
attr(,"term.labels")
[1] "cyl"             "poly(hp, 2)"     "cyl:poly(hp, 2)"
attr(,"order")
[1] 1 1 2
attr(,"intercept")
[1] 1
attr(,"response")
[1] 0
attr(,".Environment")
<environment: R_GlobalEnv>
attr(,"predvars")
list(cyl, poly(hp, 2, coefs = list(alpha = c(146.6875, 198.071514938048
), norm2 = c(1, 32, 145726.875, 977168457.086595))))
attr(,"dataClasses")
         vs         cyl poly(hp, 2) 
  "numeric"    "factor" "nmatrix.2" 

This one has a "predvars" attribute that includes the needed coefficients for the orthogonal polynomial basis.

So this is a problem in brms. I suggest reporting it and citing this issue.

I can also mention that if I do the following, I do get that needed attribute.

> bt = brms::brmsterms(formula(brmfit))
> mt = attr(model.frame(bt$dpars$mu$fe, data = brmfit$data), "terms")
> mt
~1 + cyl + poly(hp, 2) + cyl:poly(hp, 2)
attr(,"variables")
list(cyl, poly(hp, 2))
attr(,"factors")
            cyl poly(hp, 2) cyl:poly(hp, 2)
cyl           1           0               1
poly(hp, 2)   0           1               1
attr(,"term.labels")
[1] "cyl"             "poly(hp, 2)"     "cyl:poly(hp, 2)"
attr(,"order")
[1] 1 1 2
attr(,"intercept")
[1] 1
attr(,"response")
[1] 0
attr(,".Environment")
<environment: 0x000002afe919e3e8>
attr(,"predvars")
list(cyl, poly(hp, 2, coefs = list(alpha = c(146.6875, 198.071514938048
), norm2 = c(1, 32, 145726.875, 977168457.086595))))
attr(,"dataClasses")
        cyl poly(hp, 2) 
   "factor" "nmatrix.2" 
qdread commented 1 year ago

Thanks for getting back to me so quickly on these things!!!

rvlenth commented 1 year ago

For what it's worth, you can override brms's internal method if you load the following function in your workspace:

recover_data.brmsfit = function(object, data, ...) {
    message("Running outboard recover_data.brmsfit")
    bt = brms::brmsterms(formula(object))
    if (!inherits(bt, "brmsterms"))
        stop("This model is currently not supported.")
    mt = attr(model.frame(bt$dpars$mu$fe, data = object$data), "terms")
    recover_data(call("brms"), mt, "na.omit", data = object$data, ...)
}

Your example:

> emmeans(brmfit, ~ cyl | hp, at = list(hp = c(60, 90, 120)), type = 'response')
Running outboard recover_data.brmsfit
Using the maximum response value as the number of trials.
Using the maximum response value as the number of trials.
hp =  60:
 cyl      prob lower.HPD upper.HPD
 4   1.0000000   0.99947   1.00000
 6   0.0000000   0.00000   0.96090
 8   0.0000000   0.00000   1.00000

hp =  90:
 cyl      prob lower.HPD upper.HPD
 4   0.7352590   0.20589   0.99772
 6   0.0005918   0.00000   0.86143
 8   0.0000000   0.00000   1.00000

hp = 120:
 cyl      prob lower.HPD upper.HPD
 4   1.0000000   0.99988   1.00000
 6   0.8776416   0.50483   0.99976
 8   0.0000000   0.00000   0.99986

Point estimate displayed: median 
Results are back-transformed from the logit scale 
HPD interval probability: 0.95 
Warning messages:
1: Using 'binomial' families without specifying 'trials' on the left-hand side of the model formula is deprecated. 
2: Using 'binomial' families without specifying 'trials' on the left-hand side of the model formula is deprecated. 
rvlenth commented 11 months ago

Closing, as I think it's resolved, at least at emmeans's end