paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.25k stars 177 forks source link

Incomplete info in documentation regarding the combinations for `<group>=NA` and `re_formula=NULL` #1652

Open mattansb opened 1 month ago

mattansb commented 1 month ago

Currently, the cods for prepare_predictions() read:

The newdata argument seems to suggest that setting newdata = data.frame(..., group = NA) should have the same effect as re_formula = NA since in both cases the group-specific coefficients are set to 0.

But this is not the case.

Instead, it seem that

prepare_predictions(
  newdata = data.frame(..., group = NA), 
  re_formula = NULL, # default
  allow_new_levels = FALSE # default
)

is closer to

prepare_predictions(
  newdata = data.frame(..., group = "<NEW>"), 
  re_formula = NULL, # default
  allow_new_levels = TRUE
)

(even though newlevels throw an error when allow_new_levels = FALSE).

It is not clear which of sample_new_levels = c("uncertainty", "gaussian") is used in this case.

paul-buerkner commented 1 month ago

newdata = data.frame(..., group = NA) just defines a new grouping level, which does not affect any dummy variables, since random effects don't have dummy variables. Such variables only apply for fixed effects. How can we make this clearer?

mattansb commented 1 month ago

I was expecting newdata = data.frame(..., group = NA) to be the same as re_formula = NA be cause I interpreted "NA values within factors are interpreted as if all dummy variables of this factor are zero." to mean that in a mixed model

$$ y = bX + uZ + e $$

Then all $Z$ are set to 0, similar to how if group was a fixed effect all $X$ would be set to 0.

But if newdata = data.frame(..., group = NA) is just another "new" level, than it should also give an error if not setting allow_new_levels:

library(brms)

fit <- brm(count ~ 1 + (1|patient),
           data = epilepsy, family = poisson())

posterior_epred(fit,
  newdata = data.frame(patient = "<NEW>")
)
#> Error: Levels '<NEW>' of grouping factor 'patient' cannot be found in the 
#> fitted model. Consider setting argument 'allow_new_levels' to TRUE.

# Does not throw an error...
posterior_epred(fit,
  newdata = data.frame(patient = NA)
)
#>           [,1]
#> [1,]  1.772992
#> [2,]  4.682992
#> [3,] 11.606553
#> [4,]  2.182194
#> [5,]  1.660112
#> [6,]  2.234523
#> .....

If this is the intended behavior, it should also require setting allow_new_levels = TRUE, and maybe the docs should read:

NA values within fixed factors are interpreted as if all dummy variables of this factor are zero. NA values within random factors are treated as a new level.

paul-buerkner commented 1 month ago

good points. let me check in more detail.

Mattan S. Ben-Shachar @.***> schrieb am Mi., 15. Mai 2024, 07:06:

I was expecting newdata = data.frame(..., group = NA) to be the same as re_formula = NA be cause I interpreted "NA values within factors are interpreted as if all dummy variables of this factor are zero." to mean that in a mixed model

$$ y = bX + uZ + e $$

Then all $Z$ are set to 0, similar to how if group was a fixed effect all $X$ would be set to 0.

But if newdata = data.frame(..., group = NA) is just another "new" level, than it should also give an error if not setting allow_new_levels:

library(brms) fit <- brm(count ~ 1 + (1|patient), data = epilepsy, family = poisson())

posterior_epred(fit, newdata = data.frame(patient = "") )#> Error: Levels '' of grouping factor 'patient' cannot be found in the #> fitted model. Consider setting argument 'allow_new_levels' to TRUE.

Does not throw an error...

posterior_epred(fit, newdata = data.frame(patient = NA) )#> [,1]#> [1,] 1.772992#> [2,] 4.682992#> [3,] 11.606553#> [4,] 2.182194#> [5,] 1.660112#> [6,] 2.234523#> .....

If this is the intended behavior, it should also require setting allow_new_levels = TRUE, and maybe the docs should read:

NA values within fixed factors are interpreted as if all dummy variables of this factor are zero. NA values within random factors are treated as a new level.

— Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/1652#issuecomment-2111594257, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2ABDT7CHOMJQRT2JVLLZCLUMJAVCNFSM6AAAAABHSR3ZOWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJRGU4TIMRVG4 . You are receiving this because you commented.Message ID: @.***>