rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
364 stars 32 forks source link

rstanarm support with family = mgcv::betar #196

Closed singmann closed 4 years ago

singmann commented 4 years ago

It seems as if emmeans support for rstanarm models does not work with beta regression family, family = mgcv::betar.

library("rstanarm")
library("emmeans")
data("GasolineYield", package = "betareg")
model_gy <- stan_glm(
  formula           = yield ~ batch + temp,
  data              = GasolineYield,
  chains            = 1,
  iter              = 1000,
  warmup            = 500,
  family            = mgcv::betar
)
emmeans(model_gy, "batch")
# Error in ref_grid(object, ...) : 
#   Non-conformable elements in reference grid.
#  Probably due to rank deficiency not handled as expected.

However, same data and model works without problem with other estimation approaches:

gy_logit <- betareg::betareg(yield ~ batch + temp, data = GasolineYield)
emmeans(gy_logit, "batch")
#  batch emmean      SE  df asymp.LCL asymp.UCL
#  1     0.3122 0.01201 Inf    0.2887    0.3357
#  2     0.2324 0.01386 Inf    0.2052    0.2595
#  3     0.2798 0.01490 Inf    0.2506    0.3091
#  4     0.1888 0.00946 Inf    0.1703    0.2073
#  5     0.2004 0.01046 Inf    0.1799    0.2209
#  6     0.1858 0.01042 Inf    0.1654    0.2063
#  7     0.1220 0.00783 Inf    0.1066    0.1373
#  8     0.1169 0.00803 Inf    0.1012    0.1327
#  9     0.1060 0.00866 Inf    0.0891    0.1230
#  10    0.0746 0.00584 Inf    0.0632    0.0861
# 
# Confidence level used: 0.95 

mgam <- mgcv::gam(yield ~ batch + temp, data = GasolineYield, family = mgcv::betar)
emmeans(mgam, "batch", type = "response")
#  batch response      SE df lower.CL upper.CL
#  1        0.313 0.01475 21   0.2827   0.3440
#  2        0.233 0.01702 21   0.1992   0.2700
#  3        0.280 0.01829 21   0.2437   0.3196
#  4        0.189 0.01162 21   0.1663   0.2146
#  5        0.201 0.01286 21   0.1754   0.2289
#  6        0.186 0.01281 21   0.1611   0.2144
#  7        0.122 0.00962 21   0.1038   0.1439
#  8        0.117 0.00987 21   0.0984   0.1395
#  9        0.106 0.01065 21   0.0863   0.1307
#  10       0.075 0.00719 21   0.0614   0.0914
# 
# Confidence level used: 0.95 
# Intervals are back-transformed from the logit scale 
rvlenth commented 4 years ago

Thanks for reporting this. There was an extra parameter (phi) in this model that is not associated with the linear predictor. I modified the code so it uses only the parameters for the linear predictor.

> emmeans(model_gy, "batch", type = "response")
 batch response lower.HPD upper.HPD
 1        0.324    0.1843     0.474
 2        0.235    0.0955     0.392
 3        0.280    0.1459     0.436
 4        0.206    0.0897     0.327
 5        0.213    0.0993     0.332
 6        0.206    0.0896     0.335
 7        0.143    0.0560     0.241
 8        0.138    0.0620     0.242
 9        0.119    0.0426     0.265
 10       0.108    0.0561     0.202

Point estimate displayed: median 
Results are back-transformed from the logit scale 
HPD interval probability: 0.95 

I think I "probably" should change the error message too, to allow for other possibilities tjhan rank deficiency...

singmann commented 4 years ago

Hi Russ, thanks for the (as usual) super fast fix.

And a more neutral error message might have helped a bit. I tried with a few different data sets to make sure it is not related to rank deficiency.

singmann commented 4 years ago

We have checked and the fix works for our use case now. Thus, I am closing this issue.