Closed xfim closed 3 years ago
Thanks for noting this @xfim. This is really more of an mgcv issue, because bam/gam doesn't store the factor labels for the coefficients in the first place, and extract_random_effects
only works with what's provided. To demonstrate:
> coef(ga_model)
(Intercept) PeriodBefore s(Subject).1 s(Subject).2 s(Subject).3 s(Subject).4 s(Subject).5
319.2629074 -51.8875394 28.7228264 -59.4375205 -48.6033931 5.7081877 9.8441102
s(Subject).6 s(Subject).7 s(Subject).8 s(Subject).9 s(Subject).10 s(Subject).11 s(Subject).12
5.5448207 13.2677564 -2.7385305 -32.2469930 54.6258130 -17.1704188 8.5988755
s(Subject).13 s(Subject).14 s(Subject).15 s(Subject).16 s(Subject).17 s(Subject).18 s(Period,Subject).1
-4.8073688 28.0961886 5.0391359 -7.2675004 -1.5965285 14.4205392 31.5049932
s(Period,Subject).2 s(Period,Subject).3 s(Period,Subject).4 s(Period,Subject).5 s(Period,Subject).6 s(Period,Subject).7 s(Period,Subject).8
-17.2344852 -30.1831305 0.6524847 -21.7357192 -2.4121519 -13.5209369 16.3569649
s(Period,Subject).9 s(Period,Subject).10 s(Period,Subject).11 s(Period,Subject).12 s(Period,Subject).13 s(Period,Subject).14 s(Period,Subject).15
-9.3206750 14.2115743 7.9437793 -5.1889178 2.0457063 4.5461806 1.7642156
s(Period,Subject).16 s(Period,Subject).17 s(Period,Subject).18 s(Period,Subject).19 s(Period,Subject).20 s(Period,Subject).21 s(Period,Subject).22
-3.1248136 -32.6665167 16.6450791 31.1379207 -3.9979001 -1.8034280 -6.7274385
s(Period,Subject).23 s(Period,Subject).24 s(Period,Subject).25 s(Period,Subject).26 s(Period,Subject).27 s(Period,Subject).28 s(Period,Subject).29
19.6033070 -15.3310840 -10.8151421 8.4266726 12.0150820 1.9440903 4.8922833
s(Period,Subject).30 s(Period,Subject).31 s(Period,Subject).32 s(Period,Subject).33 s(Period,Subject).34 s(Period,Subject).35 s(Period,Subject).36
-2.3886638 13.1895036 -16.8002528 -7.2459180 6.4527066 3.1946754 3.9699545
When it is a standard intercept/slope setting, matching the names is straight forward, but here there is nothing to note that there are multiple levels of Period. I could maybe do some sort of check on whether it's a factor, and if so, do something accordingly, but I'm not sure how well it would generalize. Honestly it would be better if mgcv provided the names in the first place to avoid confusion (including for Subject), and having to assume a specific order has been maintained, so I'm inclined to leave it on their end, at least for now.
There is a silver lining here though, a path to get exactly what you want. When you're doing categorical random effects like this, you can just create a single 'interaction' random effect to get the same result (I have a post here with more detail). It probably doesn't matter as much with mgcv, but at least with lme4, specifying the model this way is actually more likely to converge (and more quickly), so it's a good approach to have in your toolbox in general. The basic idea is just to create a separate single factor of the Period/Subject combination, and use it for a random effect.
library(lme4)
#> Loading required package: Matrix
library(mgcv)
#> Loading required package: nlme
#>
#> Attaching package: 'nlme'
#> The following object is masked from 'package:lme4':
#>
#> lmList
#> This is mgcv 1.8-33. For overview type 'help("mgcv-package")'.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:nlme':
#>
#> collapse
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(mixedup)
ss <- sleepstudy %>%
mutate(Period = as.factor(ifelse(Days < 4, "Before", "After")))
ga_model = gam(
Reaction ~ Period + s(Subject, bs = 're') + s(Period, Subject, bs = 're'),
data = ss,
method = 'REML'
)
# extract_random_effects(ga_model) %>%
# data.frame()
ss = ss %>%
mutate(
Period = as.factor(ifelse(Days < 4, "Before", "After")),
Period2 = interaction(Subject, Period)
)
ga_model2 = gam(
Reaction ~ Period + s(Subject, bs = 're') + s(Period2, bs = 're'), # note that subject is not needed again in the second effect
data = ss,
method = 'REML'
)
period_effects1 = extract_random_effects(ga_model) %>%
filter(effect == 'Period')
period_effects2 = extract_random_effects(ga_model2) %>%
filter(grepl(group, pattern = 'Before|After')) %>%
arrange(group)
bind_cols(period_effects1, period_effects2) %>%
select(matches('group|effect|value'))
#> New names:
#> * group_var -> group_var...1
#> * effect -> effect...2
#> * group -> group...3
#> * value -> value...4
#> * se -> se...5
#> * ...
# A tibble: 36 x 8
group_var...1 effect...2 group...3 value...4 group_var...8 effect...9 group...10 value...11
<chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
1 Subject Period 308 31.5 Period2 Intercept 308.After 31.5
2 Subject Period 308 -17.2 Period2 Intercept 308.Before -17.2
3 Subject Period 309 -30.2 Period2 Intercept 309.After -30.2
4 Subject Period 309 0.652 Period2 Intercept 309.Before 0.652
5 Subject Period 310 -21.7 Period2 Intercept 310.After -21.7
6 Subject Period 310 -2.41 Period2 Intercept 310.Before -2.41
7 Subject Period 330 -13.5 Period2 Intercept 330.After -13.5
8 Subject Period 330 16.4 Period2 Intercept 330.Before 16.4
9 Subject Period 331 -9.32 Period2 Intercept 331.After -9.32
10 Subject Period 331 14.2 Period2 Intercept 331.Before 14.2
# … with 26 more rows
As you can see, the results are the same, and the new label is clear. Model summaries likewise are the same.
This does the trick. Thank you @m-clark .
When I try to get the random effects of a model using bam(), the effect is not fully labelled, leaving necessary factor labels not passed.
For instance, using the example in [https://m-clark.github.io/posts/2019-10-20-big-mixed-models/]:
I would expect that in the column with the effect the "Period" would instead read as "PeriodBefore" and "PeriodAfter", instead of only "Period", as this does not allow to recognize whether it is one or the other.
Thank you very much for the package.