Open mattansb opened 1 month ago
newdata = data.frame(..., group = NA) just defines a new grouping level, which does not affect any dummy variables, since random effects don't have dummy variables. Such variables only apply for fixed effects. How can we make this clearer?
I was expecting newdata = data.frame(..., group = NA)
to be the same as re_formula = NA
be cause I interpreted "NA
values within factors are interpreted as if all dummy variables of this factor are zero." to mean that in a mixed model
$$ y = bX + uZ + e $$
Then all $Z$ are set to 0, similar to how if group
was a fixed effect all $X$ would be set to 0.
But if newdata = data.frame(..., group = NA)
is just another "new" level, than it should also give an error if not setting allow_new_levels
:
library(brms)
fit <- brm(count ~ 1 + (1|patient),
data = epilepsy, family = poisson())
posterior_epred(fit,
newdata = data.frame(patient = "<NEW>")
)
#> Error: Levels '<NEW>' of grouping factor 'patient' cannot be found in the
#> fitted model. Consider setting argument 'allow_new_levels' to TRUE.
# Does not throw an error...
posterior_epred(fit,
newdata = data.frame(patient = NA)
)
#> [,1]
#> [1,] 1.772992
#> [2,] 4.682992
#> [3,] 11.606553
#> [4,] 2.182194
#> [5,] 1.660112
#> [6,] 2.234523
#> .....
If this is the intended behavior, it should also require setting allow_new_levels = TRUE
, and maybe the docs should read:
NA
values within fixed factors are interpreted as if all dummy variables of this factor are zero.NA
values within random factors are treated as a new level.
good points. let me check in more detail.
Mattan S. Ben-Shachar @.***> schrieb am Mi., 15. Mai 2024, 07:06:
I was expecting newdata = data.frame(..., group = NA) to be the same as re_formula = NA be cause I interpreted "NA values within factors are interpreted as if all dummy variables of this factor are zero." to mean that in a mixed model
$$ y = bX + uZ + e $$
Then all $Z$ are set to 0, similar to how if group was a fixed effect all $X$ would be set to 0.
But if newdata = data.frame(..., group = NA) is just another "new" level, than it should also give an error if not setting allow_new_levels:
library(brms) fit <- brm(count ~ 1 + (1|patient), data = epilepsy, family = poisson())
posterior_epred(fit, newdata = data.frame(patient = "
") )#> Error: Levels ' ' of grouping factor 'patient' cannot be found in the #> fitted model. Consider setting argument 'allow_new_levels' to TRUE. Does not throw an error...
posterior_epred(fit, newdata = data.frame(patient = NA) )#> [,1]#> [1,] 1.772992#> [2,] 4.682992#> [3,] 11.606553#> [4,] 2.182194#> [5,] 1.660112#> [6,] 2.234523#> .....
If this is the intended behavior, it should also require setting allow_new_levels = TRUE, and maybe the docs should read:
NA values within fixed factors are interpreted as if all dummy variables of this factor are zero. NA values within random factors are treated as a new level.
— Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/1652#issuecomment-2111594257, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2ABDT7CHOMJQRT2JVLLZCLUMJAVCNFSM6AAAAABHSR3ZOWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJRGU4TIMRVG4 . You are receiving this because you commented.Message ID: @.***>
Currently, the cods for
prepare_predictions()
read:newdata
[...]NA
values within factors are interpreted as if all dummy variables of this factor are zero.re_formula
[...] IfNULL
(default), include all group-level effects; ifNA
, include no group-level effects.The
newdata
argument seems to suggest that settingnewdata = data.frame(..., group = NA)
should have the same effect asre_formula = NA
since in both cases the group-specific coefficients are set to 0.But this is not the case.
Instead, it seem that
is closer to
(even though newlevels throw an error when
allow_new_levels = FALSE
).It is not clear which of
sample_new_levels = c("uncertainty", "gaussian")
is used in this case.