Closed vincentarelbundock closed 1 year ago
Are there updates on this development? Is the package now able to work with nlme::lme() models?
@MrJerryTAO there has been no progress because the problem has not been resolved upstream in the insight
package. Here is a reproducible example in case you want to look at the insight
code to help fix this:
library(nlme)
mod <- lme(distance ~ age, data = Orthodont, random = ~ Sex)
fm1 <- lme(distance ~ age, data = Orthodont)
fm2 <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1)
fm3 <- lme(distance ~ age, data = Orthodont, random = ~ Sex)
# missing Subject
insight::find_predictors(fm1)
# $conditional
# [1] "age"
# missing Subject
insight::find_predictors(fm2)
# $conditional
# [1] "age" "Sex"
# missing Subject and Sex
insight::find_predictors(fm3)
# $conditional
# [1] "age"
# error
insight::find_variables(fm3)
# Error in gregexpr(pattern = "\\|", re)[[1]]: subscript out of bounds
@MrJerryTAO there has been no progress because the problem has not been resolved upstream in the
insight
package. Here is a reproducible example in case you want to look at theinsight
code to help fix this:library(nlme) mod <- lme(distance ~ age, data = Orthodont, random = ~ Sex) fm1 <- lme(distance ~ age, data = Orthodont) fm2 <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1) fm3 <- lme(distance ~ age, data = Orthodont, random = ~ Sex) # missing Subject insight::find_predictors(fm1) # $conditional # [1] "age" # missing Subject insight::find_predictors(fm2) # $conditional # [1] "age" "Sex" # missing Subject and Sex insight::find_predictors(fm3) # $conditional # [1] "age" # error insight::find_variables(fm3) # Error in gregexpr(pattern = "\\|", re)[[1]]: subscript out of bounds
Hi @vincentarelbundock, in my test of insight::find_, both works almost okay. insight::find_variables(fm3) had a problem in your case because the way specifying random effects were not recognized by {insight}, but using a complete formula for random effects solves the problem as following. Does it look promising to you?
However, what is missing in both are terms specified in the weights = argument in lme() that models heteroscedasticity (how the residual standard deviation varies by a function of variables). But if the lme() model does not have a weights = {} component, I guess these insight retrieval functions should be fine.
summary(mod <- lme(
distance ~ age, data = Orthodont,
random = ~ 1 | Subject,
weights = varIdent(form = ~ 1 | Sex)))
insight::find_predictors(mod)
'$conditional
[1] "age"'
insight::find_variables(mod)
'$response
[1] "distance"
$conditional
[1] "age"
$random
[1] "Subject"'
names(get_data(mod))
'[1] "distance" "age" "Subject"'
It appears that insight::find_variables() returns more variable names than insight::findpredictors(). In my modeling as shown below, "milk" as an explanatory variable in weights = {} is not selected by either of the find() functions. However, all other variables were retained by find_variables() and get_data()
summary(Model_Temp[[1]] <- lme(
y ~ pma0 + hc0,
random = list(~ 1 | id),
weights = varIdent(form = ~ 1 | milk), # sex*milk
correlation = corAR1(form = ~ dayab | id), # corCAR1()
data = Data_Clean,
control = lmeControl(opt = "nlminb", msMaxIter = 500), method = "REML")
)
insight::find_predictors(Model_Temp[[1]])
'$conditional
[1] "pma0" "hc0"
$correlation
[1] "dayab" "id" '
insight::find_variables(Model_Temp[[1]])
'$response
[1] "y"
$conditional
[1] "pma0" "hc0"
$random
[1] "id"
$correlation
[1] "dayab" "id" '
names(get_data(Model_Temp[[1]]))
'"y" "pma0" "hc0" "id" "dayab"'
No, we need full support, unfortunately.
No, we need full support, unfortunately.
@vincentarelbundock, I just updated my example script output above with some new findings.
If find_variables() returns all variable names relevant in a model, will it solve the problem? It does not seem so, as a lme() model without weights = {} argument still has NA as inferential statistics in predictions() which only provides point estimates.
Or is it more about get_data() returning a data frame with all variables? It does not seem so either, as this function gives a data frame with the variables in find_variables().
Or is it because {marginaleffects} calls find_predictors() only and therefore miss many other variables? I had a hard time exploring how these insight::: functions are used in {marginaleffects}.
We need all of those insight
functions to work with all valid nlme
models, regardless of the inputs supplied by the user:
find_predictors find_variables find_weights get_data
Unfortunately I don't know what to do in this situation. I explored these {insight} functions, but I could not find out how {marginaleffects} uses them. It might be too complex for me to offer help, as I am not familiar with {insight} and how {marginaleffects} depends on it. I turned to {ggeffects} for nlme models.
To be clear, the interaction between insight
and marginaleffects
does not matter. All that matters is that the 4 functions above in insight
work fully. Once that is done, I can add support here.
Glad to hear that ggeffects
is working for you.
Hi @vincentarelbundock, I reported the variable-left-out issue in {insight} https://github.com/easystats/insight/issues/727. It appears to me that the only misbehaving function in {insights} of lme() objects is find_weights() as I illustrated in the issue above. Do you agree?
However, when I fixed the find_weights() function, so it returns the variable name in the lme() weights = varFunc() argument and get_data() returns all required variables, using an earlier version of {marginaleffects} still gets only point estimates but no SE. See the following example
summary(Model <- lme(
distance ~ age, data = Orthodont,
random = ~ 1 | Subject,
weights = varIdent(form = ~ 1 | Sex)))
find_predictors(Model)
'$conditional
[1] "age"'
find_variables(Model)
'$response
[1] "distance"
$conditional
[1] "age"
$random
[1] "Subject"'
find_weights(Model) # "varIdent(form = ~1 | Sex)" before fix
'"Sex"'
class(find_weights(Model)) # "character"
names(get_data(Model)) # Sex was left out
'"distance" "age" "Subject" "Sex"'
predictions(Model) # missing SE
predictions(Model, newdata = Orthodont) # same
" Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
25.6 NA NA NA NA NA
26.7 NA NA NA NA NA
27.8 NA NA NA NA NA
28.8 NA NA NA NA NA
21.8 NA NA NA NA NA
--- 98 rows omitted. See ?avg_predictions and ?print.marginaleffects ---"
Do you agree?
This sounds right, but I have not looked at it deeply enough to be 100% sure.
However, when I fixed the find_weights() function, so it returns the variable name in the lme() weights = varFunc() argument and get_data() returns all required variables, using an earlier version of {marginaleffects} still gets only point estimates but no SE. See the following example
This is normal. I need to do other stuff in marginaleffects
to add full support.
This is normal. I need to do other stuff in
marginaleffects
to add full support.
Would be great to see the updates in future versions! Please follow up on https://github.com/easystats/insight/issues/727 if you need additional support in {insight}.
Glad to hear that ggeffects is working for you.
I prefer {marginaleffects} because the SE and the estimate are consistently (as far as I have found) on the same scale. In contrast with {ggeffects}, SE is usually on the linear predictor scale, no matter what estimate stands for. This remained a myth until I compared results between {marginaleffects} and {ggeffects} on a Poisson regression.
The side effect of {ggeffects} depending on linear-predictor SEs is that {ggeffects} can give asymmetrical CIs (e.g. Poisson mean > 0, and probability within [0, 1]), whereas {marginaleffects} always gives symmetrical CIs that sometimes go beyond possible limits (e.g. Poisson mean < 0, and probability outside [0, 1]). My solution is to use response estimates (y) and response SEs SE(y) from {marginaleffects} and manually transform them onto the linear-predictor scale due to the link function y = g(x), recognizing that SE(y) = g'(x)SE(x), before building a CI based on SE(x). I do not use linear-predictor estimates (x) and SEs SE(x) from {marginaleffects} to find estimate and CI of mean response because the mean response does not equal to a nonlinear link function of mean linear predictor in glm() models, contrary to what you once suggested.
It would be nice if {marginaleffects} produces asymmetrical, range-preserving CIs when necessary, using the method above.
In my GitHub of some customized R functions, I proposed efficient dplyr programming to generate repeated-measurements-, variance-, and difference-adjusted, range-preserving CIs for means and proportions from individual scores and group estimates, which are robust to imbalanced factorial designs, unequal group variances, different group sizes, and missing observations. I used {marginaleffects} summary results as inputs of group estimates to my function, which produces difference-adjusted, range-preserving CIs. Though, the range-preserving technique here is based on functions like log(y - a)
, log(b - y)
, and log((y - a)/(b - y))
, where a and b are minimum and maximum possible values. You could incorporate my methods in {marginaleffects} summary outputs by having arguments like (diffadjusted = TRUE, minimum = a, maximum = b).
Thanks for the link and suggestions, but I am unlikely to implement that. You should feel free to publish an extension package.
@MrJerryTAO there is now experimental support for nlme::lme()
. Let me know if you try and run into issues:
library(nlme)
library(marginaleffects)
mod <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1)
avg_slopes(mod)
#
# Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
# Sex Female - Male -2.32 0.7614 -3.05 0.0023 -3.813 -0.829
# age dY/dX 0.66 0.0616 10.72 <0.001 0.539 0.781
#
# Columns: term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high
avg_comparisons(mod)
#
# Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
# Sex Female - Male -2.32 0.7614 -3.05 0.0023 -3.813 -0.829
# age +1 0.66 0.0616 10.72 <0.001 0.539 0.781
#
# Columns: term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high
avg_predictions(mod, by = "Sex")
#
# Sex Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
# Male 25.0 0.486 51.4 <0.001 24.0 25.9
# Female 22.6 0.586 38.6 <0.001 21.5 23.8
#
# Columns: Sex, estimate, std.error, statistic, p.value, conf.low, conf.high
Excellent, @vincentarelbundock. Did not expect it to be so prompt.
The current lme support in marginaleffect works mostly okay except
I also tested {ggeffects} given {insight} update. It has another series of errors and mistakes that I reported at https://github.com/strengejacke/ggeffects/issues/295. Nevertheless, it does differentiate between population SE and individual SE by type = "fixed" and "random".
Model set up. {insight} find and get seem to work well.
library(nlme)
library(insight)
library(marginaleffects)
data("Orthodont", package = "nlme")
ggplot( # spaghetti plot with LOESS group mean
data = Orthodont,
aes(x = age, y = distance, group = Subject, color = Sex, fill = Sex)) +
geom_smooth(aes(
group = Sex), alpha = 0.6, method = "loess", span = 1, show.legend = F) +
geom_line(size = 0.2, alpha = 0.6) +
scale_color_manual(
breaks = c("Female", "Male"),
values = c("#D14343", "#2197DF")) +
scale_fill_manual(
breaks = c("Female", "Male"),
values = c("#D14343", "#2197DF")) +
scale_x_continuous(breaks = c(8, 10, 12, 14), minor_breaks = NULL) +
labs(x = "Age (year)", y = "Distance") +
theme_minimal() +
theme(
strip.placement = "outside",
strip.text = element_text(size = rel(1)),
legend.title = element_blank(),
legend.position = c(0.1, 0.7), # "bottom"
legend.direction = "vertical", legend.box = "vertical",
legend.margin = margin(0, 0, 0, 0),
legend.spacing = unit(0, "line"),
legend.background = element_blank(),
legend.text = element_text(size = rel(0.8)),
legend.key = element_blank(),
legend.key.size = unit(0.8, "line"),
legend.key.width = unit(1.2, "line"),
panel.grid.major.y = element_line(colour = "grey90", size = 0.2),
panel.grid.minor.y = element_line(colour = "grey90", size = 0.2),
panel.grid.major.x = element_line(colour = "grey90", size = 0.2),
panel.grid.minor.x = element_line(colour = "grey90", size = 0.2)
)
summary(Model <- lme( # using all formula components
distance ~ age, data = Orthodont, # fixed effect
random = list(~ 1 + age | Subject), # correlated random int + age
correlation = corAR1( # error correlation over time within Subject
form = ~ as.integer(as.factor(age)) | Subject), # 1:4 as time index
weights = varIdent(form = ~ 1 | Sex))) # heterosc: M/F has diff var
" AIC BIC logLik
436.6792 457.9867 -210.3396
Random effects:
Formula: ~1 + age | Subject
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 2.3152914 (Intr)
age 0.2402225 -0.619
Residual 1.5037380
Correlation Structure: AR(1)
Formula: ~as.integer(as.factor(age)) | Subject
Parameter estimate(s):
Phi
-0.2325323
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Sex
Parameter estimates:
Male Female
1.000000 0.401668
Fixed effects: distance ~ age
Value Std.Error DF t-value p-value
(Intercept) 17.115492 0.6384924 80 26.8061 0
age 0.624408 0.0617834 80 10.1064 0"
vcov(Model) # only among fixed effect coefficients
" (Intercept) age
(Intercept) 0.4599040 -0.024615698
age -0.0246157 0.002237791"
find_predictors(Model)
'$conditional
[1] "age"
$correlation
[1] "age" "Subject"'
find_predictors(Model, effects = "all", component = "all")
'$conditional
[1] "age"
$random
[1] "Subject"
$correlation
[1] "age" "Subject"'
find_variables(Model)
'$response
[1] "distance"
$conditional
[1] "age"
$random
[1] "Subject"
$correlation
[1] "age" "Subject"'
find_weights(Model) # "varIdent(form = ~1 | Sex)" before fix
'"Sex"'
class(find_weights(Model)) # "character"
names(get_data(Model)) # Sex left out before fix
'"distance" "age" "Subject" "Sex"'
predict.lme() as a reference. It has a level = c() argument to include or not random effects by clustering level.
# Prediction using nlme
Orthodont[Orthodont$Subject %in% c("M01", "F01"), ]
"Grouped Data: distance ~ age | Subject
distance age Subject Sex
1 26.0 8 M01 Male
2 25.0 10 M01 Male
3 29.0 12 M01 Male
4 31.0 14 M01 Male
65 21.0 8 F01 Female
66 20.0 10 F01 Female
67 21.5 12 F01 Female
68 23.0 14 F01 Female"
predict(Model, newdata = expand.grid(
age = c(8, 10, 12, 14), Sex = c("Male", "Female")),
level = 0) # mean of population. No SE or CI option.
# Male has the same prediction as Female: Sex is only in weights
"[1] 22.11075 23.35957 24.60839 25.85720 22.11075 23.35957 24.60839
[8] 25.85720"
predict(Model, newdata = Orthodont[
Orthodont$Subject %in% c("M01", "F01"), ],
level = 0) # same result as above
predict(Model, newdata = expand.grid(
age = c(8, 10, 12, 14), Sex = c("Male", "Female")),
level = 1) # must supply cluster variables
"Error in predict.lme:
cannot evaluate groups for desired levels on 'newdata'"
predict(Model, newdata = expand.grid( # level = max() by default
age = c(8, 10, 12, 14), Sex = c("Male", "Female"))) # same error
predict(Model, newdata = Orthodont[
Orthodont$Subject %in% c("M01", "F01"), ],
level = 1) # different when including random effects
" M01 M01 M01 M01 F01 F01 F01
24.77898 26.53948 28.29998 30.06047 20.10894 20.92936 21.74977
F01
22.57018"
predict(Model, newdata = Orthodont[
Orthodont$Subject %in% c("M01", "F01"), ],
level = 0:1) # level can be a vector, 0 for population
" Subject predict.fixed predict.Subject
1 M01 22.11075 24.77898
2 M01 23.35957 26.53948
3 M01 24.60839 28.29998
4 M01 25.85720 30.06047
65 F01 22.11075 20.10894
66 F01 23.35957 20.92936
67 F01 24.60839 21.74977
68 F01 25.85720 22.57018"
{marginaleffect} results. See comments inside.
# Marginal effects using marginaleffects
predictions( # includes random effects (level=1) by default
Model, newdata = Orthodont[ # individual mean but SE constant by Subject
Orthodont$Subject %in% c("M01", "F01"), ]) # incorrect SE!!!
"Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % age Subject Sex
24.8 0.394 63.0 <0.001 24.0 25.6 8 M01 Male
26.5 0.410 64.7 <0.001 25.7 27.3 10 M01 Male
28.3 0.460 61.5 <0.001 27.4 29.2 12 M01 Male
30.1 0.535 56.2 <0.001 29.0 31.1 14 M01 Male
20.1 0.394 51.1 <0.001 19.3 20.9 8 F01 Female
20.9 0.410 51.1 <0.001 20.1 21.7 10 F01 Female
21.7 0.460 47.3 <0.001 20.8 22.7 12 F01 Female
22.6 0.535 42.2 <0.001 21.5 23.6 14 F01 Female"
predictions( # cannot predict individual mean without cluster variable
Model, newdata = expand.grid( # this should be desired and expected
age = c(8, 10, 12, 14), Sex = c("Male", "Female")))
"Error: Unable to compute predicted values with this model. You can try
to supply a different dataset to the `newdata` argument. This
error was also raised:
cannot evaluate groups for desired levels on 'newdata'"
predictions( # warning for using level =, but argument worked
Model, newdata = Orthodont[
Orthodont$Subject %in% c("M01", "F01"), ],
level = 0) # gives population mean and ?perhaps population SE
"Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % age Subject Sex
22.1 0.394 56.2 <0.001 21.3 22.9 8 M01 Male
23.4 0.410 57.0 <0.001 22.6 24.2 10 M01 Male
24.6 0.460 53.5 <0.001 23.7 25.5 12 M01 Male
25.9 0.535 48.4 <0.001 24.8 26.9 14 M01 Male
22.1 0.394 56.2 <0.001 21.3 22.9 8 F01 Female
23.4 0.410 57.0 <0.001 22.6 24.2 10 F01 Female
24.6 0.460 53.5 <0.001 23.7 25.5 12 F01 Female
25.9 0.535 48.4 <0.001 24.8 26.9 14 F01 Female
Warning message:
These arguments are not supported for models of class `lme`: level. "
avg_predictions(Model) # random effect should average out; incorrect SE???
" Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
24 0.431 55.6 <0.001 23.1 24.8"
avg_predictions(Model, level = 0) # same result as above; but warning
avg_predictions( # different means; but why? possibly an individual mean
Model, by = "Sex") # with different random effects across ind by Sex
" Sex Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
Male 24.9 0.431 57.7 <0.001 24.0 25.7
Female 22.7 0.431 52.5 <0.001 21.8 23.5"
avg_predictions( # same population mean by Sex
Model, by = "Sex", level = 0) # but SE does not vary by Sex!!!
" Sex Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
Male 24 0.431 55.6 <0.001 23.1 24.8
Female 24 0.431 55.6 <0.001 23.1 24.8
Warning message:
These arguments are not supported for models of class `lme`: level."
plot_predictions(Model, condition = "Sex") # equal CI by Sex. Incorrect!!!
comparisons(Model, by = T) # Sex in weights not included?
# correct to do so: does not affect mean
" Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
age +1 0.624 0.0618 10.1 <0.001 0.503 0.746"
avg_slopes(Model) # same result as above
comparisons( # two rows different due to random age effects differ by Sex
Model, by = "Sex") # but equal SE by Sex. Incorrect!!!
" Term Contrast Sex Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
age mean(+1) Male 0.713 0.0618 11.53 <0.001 0.592 0.834
age mean(+1) Female 0.496 0.0618 8.03 <0.001 0.375 0.617"
avg_slopes(Model, by = "Sex") # same result as above
comparisons( # same issue: equal SE by Sex. Incorrect!!!
Model, by = "Sex", level = 0) # level = worked but warning
"Term Contrast Sex Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
age mean(+1) Male 0.624 0.0618 10.1 <0.001 0.503 0.746
age mean(+1) Female 0.624 0.0618 10.1 <0.001 0.503 0.746
Warning message:
These arguments are not supported for models of class `lme`: level. "
avg_slopes(Model, by = "Sex", level = 0) # same as above
plot_comparisons( # plots comparisons(Model, by = "Sex"); incorrect equal CI
Model, variables = c("age"), by = c("Sex"))
plot_comparisons( # mean at ~0.88; neither population of individual
Model, variables = c("age"), condition = c("Sex")) # what is it???
marginal_means(Model) # individual means; strange ordering
" Term Value Mean Std. Error z Pr(>|z|) 2.5 % 97.5 %
Subject M16 23.0 0.431 53.4 <0.001 22.2 23.9
Subject M05 23.1 0.431 53.6 <0.001 22.3 24.0
...
Subject F11 26.4 0.431 61.1 <0.001 25.5 27.2"
marginal_means(Model, level = 1) # same result as above with warning
# same SE for all Subject? Does not look right!!!
marginal_means(Model, level = 0) # equal population mean with warning
" Term Value Mean Std. Error z Pr(>|z|) 2.5 % 97.5 %
Subject M16 24 0.431 55.6 <0.001 23.1 24.8
Subject M05 24 0.431 55.6 <0.001 23.1 24.8
....
Subject F11 24 0.431 55.6 <0.001 23.1 24.8"
hypotheses(Model) # no test on other parameters, like correlation structure
# coefficient and heteroscedasticity in variance function
" Term Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
b1 17.115 0.6385 26.8 <0.001 15.864 18.367
b2 0.624 0.0618 10.1 <0.001 0.503 0.746"
hypotheses(Model, "age = 1") # worked
" Term Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
age = 1 -0.376 0.0618 -6.08 <0.001 -0.497 -0.254"
hypotheses(Model, "(Intercept) = 1") # could not refer to intercept
"object 'Intercept' not found"
hypotheses( # Error: Assertion on 'df' failed: Must have length 1
Model, df = insight::get_df(Model)) # df can be coef-specific
Thanks for the reports.
level
warning.df
bug.by=TRUE
marginaleffects
does not aggregate by weights. This would be inappropriate and counter-intuitive in the vast majority of applications You can always do by="Sex"
if that's what you want.hypotheses(Model, "`(Intercept)` = 1")
Thank you for the update.
Only fixed effects are taken into account when computing standard errors.
I don't think this is theoretically true. It depends one whether the prediction is for a population mean (fixed effect SD) or cluster mean (both fixed and random effect SD), corresponding to a confidence interval vs. prediction interval.
I don't understand why you expect the standard errors to be different if the weights are different but all fixed effects are identical (given the previous point). Weights are not (and probably should not) be taken into account when computing standard errors. By default, when by=TRUE marginaleffects does not aggregate by weights.
Weights in nlme::gls() and lme() may be defined very differently from those in other packages: both model the heteroscedasticity form to allow population SE changes with some predictors, and estimate coefficients in the variance effect; therefore, both confidence interval and prediction interval should change in width according to the variance function in weight = varFunc() if the predictor changes, not just in observed sample, but also in counterfactual predictions. See examples in lme() https://github.com/easystats/insight/issues/727#issuecomment-1466386234.
In contrast, ggeffect() takes an incorrect approach to allow population SE to vary by some predictor as in the very last execution in https://github.com/strengejacke/ggeffects/issues/295#issuecomment-1466028273 where CI widens with age, but age has only a linear fixed and random effect on distance. Not sure how these age-varying SE and CI were calculated, but they look incorrect: weight = varFunc() coefficient estimates say that population SE and CI width should reduce with age in a power function.
Thanks for the clarification.
I don't know these model objects well enough to implement a whole new infrastructure to support all this.
In your opinion, would it be sufficient to:
- Document that the estimates are population means. Since you allow level = , point estimates can be either population means (level = 0) or cluster means (level = 1, 2, ...)
- Raise an error when weights are used because they are not supported.
Possibly, but probably better to issue a warning instead because the SE the current infrastructure already got should be quite helpful. See predict.lm() help file with weights argument. It allows user to specify a numeric vector or formula to apply on new data. Warning is printed in certain cases.
I don't know these model objects well enough to implement a whole new infrastructure to support all this.
The most useful resource I have found so far is https://fw8051statistics4ecologists.netlify.app/gls.html#Sockeye It appear what you have now are most likely population SE (fe_se) and CI of population mean, which depend only on the vcov() of fixed-effect coefficients and fixed-effect variables in new data.
To make a CI with random effects, so it measures the CI of cluster means (wider than that of the population mean), enlarge the SE to be re_se = sqrt(fe_se^2 + re_sd^2)
; where re_sd should be derived from Model$modelStruct$reStruct, but I don't know how. Random effects at different levels and of different predictors can be separated, so there could be multiple versions of re_sd to select from.
Within each cluster, the error term is allowed to correlate through correlation = corStruct
. For example, in longitudinal data where cluster is each Subject, correlated errors between multiple measurements of the same Subject can be via corAR1(). Although it only gives a correlation coefficient, it matters to SE or CI calculation as in time series analysis. I don't know how this should affect SE and CI at which level: I suspect that fixed-effect SE/CI is not affected, but I am not sure about random-effect SE/CI.
To make a prediction interval with observation-level uncertainty, so it measures the CI of possible Y (random-effects, cluster mean CI), further enlarge the SE to be obs_se = sqrt(fe_se^2 + re_sd^2 + (sigma*varFunc())^2)
. Model$sigma has the residual SE directly, but there are many types of weights = varFunc()
using different link functions as shown https://fw8051statistics4ecologists.netlify.app/gls.html#variance-increasing-with-x_i. Model$modelStruct$varStruct and Model$apVar contain the information for varFunc but may have strange parametrization. At observation level, within-cluster error correlation definitely matters, but I don't know how SE and CI are a function of error correlation coefficients.
Awwwww, I think I understand now!
You are not saying that the standard errors are statistically incorrect. What you are saying is that these are not the quantities you want.
To be clear:
marginaleffects
estimates confidence intervals and not prediction intervals (except for brms
, where it can do both).lme
models should be given a "population" interpretation because they are constructed only based on (a) the variance-covariance matrix of the fixed effects and (b) a jacobian of derivatives taken with respect to fixed effects coefficients.So yes, you are right that the intervals do not account for the random effects structure. This is known.
In fact, you should know that this is the same approach that we use everywhere in the package, including for lme4
models.
For now, prediction intervals are out of scope, as are confidence intervals that account for the covariance structure of REs.
marginaleffects estimates confidence intervals and not prediction intervals In fact, you should know that this is the same approach that we use everywhere in the package, including for lme4 models.
I have not tried marginaleffects with lmer() models. What do you think it indicates if I use population mean standard errors for cluster mean predictions? i.e., in predictions(lme(), level = 1, 2, ...) for cluster mean estimates with SE equal to those of population mean estimates. The strategy ggeffects take in ggemmeans() and ggeffect()
type = "random" still returns population-level predictions, however, unlike type = "fixed", intervals also consider the uncertainty in the variance parameters (the mean random effect variance, see Johnson et al. 2014 for details) and hence can be considered as prediction intervals... To get predicted values for each level of the random effects groups, add the name of the related random effect term to the terms-argument (for more details, see this vignette).
Regarding comparisons() and slopes() with level = 0 vs level = 1, 2, ... ggeffect() for marginal effects currently do not have a type = "fixed" "random" argument. Does that mean marginal effects should not need bother the distinction between those of fixed effects and those including random effects? What does it indicate if I use population standard errors for cluster mean comparisons and slopes with level = 1?
If you want to really nail down the correct interpretation, I recommend you talk to a local lme
expert and/or statistician.
Not supported by
insight
: https://github.com/easystats/insight/issues/66User request: https://twitter.com/aosmith16/status/1444733963151544321?s=21
Working
get_coef
get_vcov
get_predict
set_coef
Not working
insight::get_data
predict()
function.Example