Support: `lme` - Githubissues

vincentarelbundock commented 2 years ago

Not supported by insight: https://github.com/easystats/insight/issues/66

User request: https://twitter.com/aosmith16/status/1444733963151544321?s=21

Working

get_coef
get_vcov
get_predict
set_coef

#' @rdname set_coef
#' @export
set_coef.lme <- function(model, coefs, ...) {
    model[["coefficients"]][["fixed"]][names(coefs)] <- coefs
    model
}

Not working

insight::get_data
- The cluster variables do not get extracted, which poses problems when using the predict() function.

Example

model <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1)

DrJerryTAO commented 1 year ago

Are there updates on this development? Is the package now able to work with nlme::lme() models?

vincentarelbundock commented 1 year ago

@MrJerryTAO there has been no progress because the problem has not been resolved upstream in the insight package. Here is a reproducible example in case you want to look at the insight code to help fix this:

library(nlme)
mod <- lme(distance ~ age, data = Orthodont, random = ~ Sex)

fm1 <- lme(distance ~ age, data = Orthodont)
fm2 <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1)
fm3 <- lme(distance ~ age, data = Orthodont, random = ~ Sex)

# missing Subject
insight::find_predictors(fm1)
# $conditional
# [1] "age"

# missing Subject
insight::find_predictors(fm2)
# $conditional
# [1] "age" "Sex"

# missing Subject and Sex
insight::find_predictors(fm3)
# $conditional
# [1] "age"

# error
insight::find_variables(fm3)
# Error in gregexpr(pattern = "\\|", re)[[1]]: subscript out of bounds

DrJerryTAO commented 1 year ago

@MrJerryTAO there has been no progress because the problem has not been resolved upstream in the insight package. Here is a reproducible example in case you want to look at the insight code to help fix this:

library(nlme)
mod <- lme(distance ~ age, data = Orthodont, random = ~ Sex)

fm1 <- lme(distance ~ age, data = Orthodont)
fm2 <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1)
fm3 <- lme(distance ~ age, data = Orthodont, random = ~ Sex)

# missing Subject
insight::find_predictors(fm1)
# $conditional
# [1] "age"

# missing Subject
insight::find_predictors(fm2)
# $conditional
# [1] "age" "Sex"

# missing Subject and Sex
insight::find_predictors(fm3)
# $conditional
# [1] "age"

# error
insight::find_variables(fm3)
# Error in gregexpr(pattern = "\\|", re)[[1]]: subscript out of bounds

Hi @vincentarelbundock, in my test of insight::find_, both works almost okay. insight::find_variables(fm3) had a problem in your case because the way specifying random effects were not recognized by {insight}, but using a complete formula for random effects solves the problem as following. Does it look promising to you?

However, what is missing in both are terms specified in the weights = argument in lme() that models heteroscedasticity (how the residual standard deviation varies by a function of variables). But if the lme() model does not have a weights = {} component, I guess these insight retrieval functions should be fine.

summary(mod <- lme(
  distance ~ age, data = Orthodont, 
  random = ~ 1 | Subject, 
  weights = varIdent(form = ~ 1 | Sex)))
insight::find_predictors(mod)
'$conditional
[1] "age"'
insight::find_variables(mod)
'$response
[1] "distance"
$conditional
[1] "age"
$random
[1] "Subject"'
names(get_data(mod))
'[1] "distance" "age"      "Subject"'

It appears that insight::find_variables() returns more variable names than insight::findpredictors(). In my modeling as shown below, "milk" as an explanatory variable in weights = {} is not selected by either of the find() functions. However, all other variables were retained by find_variables() and get_data()

summary(Model_Temp[[1]] <- lme(
  y ~ pma0 + hc0, 
  random = list(~ 1 | id), 
  weights = varIdent(form = ~ 1 | milk), # sex*milk
  correlation = corAR1(form = ~ dayab | id), # corCAR1()
  data = Data_Clean, 
  control = lmeControl(opt = "nlminb", msMaxIter = 500), method = "REML") 
)
insight::find_predictors(Model_Temp[[1]])
'$conditional
[1] "pma0" "hc0" 
$correlation
[1] "dayab" "id" '
insight::find_variables(Model_Temp[[1]])
'$response
[1] "y"
$conditional
[1] "pma0" "hc0" 
$random
[1] "id"
$correlation
[1] "dayab" "id" '
names(get_data(Model_Temp[[1]]))
'"y"     "pma0"  "hc0"   "id"    "dayab"'

vincentarelbundock commented 1 year ago

No, we need full support, unfortunately.

DrJerryTAO commented 1 year ago

No, we need full support, unfortunately.

@vincentarelbundock, I just updated my example script output above with some new findings.

If find_variables() returns all variable names relevant in a model, will it solve the problem? It does not seem so, as a lme() model without weights = {} argument still has NA as inferential statistics in predictions() which only provides point estimates.

Or is it more about get_data() returning a data frame with all variables? It does not seem so either, as this function gives a data frame with the variables in find_variables().

Or is it because {marginaleffects} calls find_predictors() only and therefore miss many other variables? I had a hard time exploring how these insight::: functions are used in {marginaleffects}.

vincentarelbundock commented 1 year ago

We need all of those insight functions to work with all valid nlme models, regardless of the inputs supplied by the user:

find_predictors find_variables find_weights get_data

DrJerryTAO commented 1 year ago

Unfortunately I don't know what to do in this situation. I explored these {insight} functions, but I could not find out how {marginaleffects} uses them. It might be too complex for me to offer help, as I am not familiar with {insight} and how {marginaleffects} depends on it. I turned to {ggeffects} for nlme models.

vincentarelbundock commented 1 year ago

To be clear, the interaction between insight and marginaleffects does not matter. All that matters is that the 4 functions above in insight work fully. Once that is done, I can add support here.

Glad to hear that ggeffects is working for you.

DrJerryTAO commented 1 year ago

Hi @vincentarelbundock, I reported the variable-left-out issue in {insight} https://github.com/easystats/insight/issues/727. It appears to me that the only misbehaving function in {insights} of lme() objects is find_weights() as I illustrated in the issue above. Do you agree?

However, when I fixed the find_weights() function, so it returns the variable name in the lme() weights = varFunc() argument and get_data() returns all required variables, using an earlier version of {marginaleffects} still gets only point estimates but no SE. See the following example


summary(Model <- lme(
  distance ~ age, data = Orthodont, 
  random = ~ 1 | Subject, 
  weights = varIdent(form = ~ 1 | Sex)))
find_predictors(Model)
'$conditional
[1] "age"'
find_variables(Model)
'$response
[1] "distance"
$conditional
[1] "age"
$random
[1] "Subject"'
find_weights(Model) # "varIdent(form = ~1 | Sex)" before fix
'"Sex"'
class(find_weights(Model)) # "character"
names(get_data(Model)) # Sex was left out
'"distance" "age"      "Subject"  "Sex"'
predictions(Model) # missing SE
predictions(Model, newdata = Orthodont) # same
" Estimate Std. Error  z Pr(>|z|) 2.5 % 97.5 %
     25.6         NA NA       NA    NA     NA
     26.7         NA NA       NA    NA     NA
     27.8         NA NA       NA    NA     NA
     28.8         NA NA       NA    NA     NA
     21.8         NA NA       NA    NA     NA
--- 98 rows omitted. See ?avg_predictions and ?print.marginaleffects ---"

vincentarelbundock commented 1 year ago

Do you agree?

This sounds right, but I have not looked at it deeply enough to be 100% sure.

However, when I fixed the find_weights() function, so it returns the variable name in the lme() weights = varFunc() argument and get_data() returns all required variables, using an earlier version of {marginaleffects} still gets only point estimates but no SE. See the following example

This is normal. I need to do other stuff in marginaleffects to add full support.

DrJerryTAO commented 1 year ago

This is normal. I need to do other stuff in marginaleffects to add full support.

Would be great to see the updates in future versions! Please follow up on https://github.com/easystats/insight/issues/727 if you need additional support in {insight}.

Glad to hear that ggeffects is working for you.

I prefer {marginaleffects} because the SE and the estimate are consistently (as far as I have found) on the same scale. In contrast with {ggeffects}, SE is usually on the linear predictor scale, no matter what estimate stands for. This remained a myth until I compared results between {marginaleffects} and {ggeffects} on a Poisson regression.

The side effect of {ggeffects} depending on linear-predictor SEs is that {ggeffects} can give asymmetrical CIs (e.g. Poisson mean > 0, and probability within [0, 1]), whereas {marginaleffects} always gives symmetrical CIs that sometimes go beyond possible limits (e.g. Poisson mean < 0, and probability outside [0, 1]). My solution is to use response estimates (y) and response SEs SE(y) from {marginaleffects} and manually transform them onto the linear-predictor scale due to the link function y = g(x), recognizing that SE(y) = g'(x)SE(x), before building a CI based on SE(x). I do not use linear-predictor estimates (x) and SEs SE(x) from {marginaleffects} to find estimate and CI of mean response because the mean response does not equal to a nonlinear link function of mean linear predictor in glm() models, contrary to what you once suggested.

It would be nice if {marginaleffects} produces asymmetrical, range-preserving CIs when necessary, using the method above.

In my GitHub of some customized R functions, I proposed efficient dplyr programming to generate repeated-measurements-, variance-, and difference-adjusted, range-preserving CIs for means and proportions from individual scores and group estimates, which are robust to imbalanced factorial designs, unequal group variances, different group sizes, and missing observations. I used {marginaleffects} summary results as inputs of group estimates to my function, which produces difference-adjusted, range-preserving CIs. Though, the range-preserving technique here is based on functions like log(y - a), log(b - y), and log((y - a)/(b - y)), where a and b are minimum and maximum possible values. You could incorporate my methods in {marginaleffects} summary outputs by having arguments like (diffadjusted = TRUE, minimum = a, maximum = b).

vincentarelbundock commented 1 year ago

Thanks for the link and suggestions, but I am unlikely to implement that. You should feel free to publish an extension package.

vincentarelbundock commented 1 year ago

@MrJerryTAO there is now experimental support for nlme::lme(). Let me know if you try and run into issues:

library(nlme)
library(marginaleffects)
mod <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1)

avg_slopes(mod)
# 
#  Term      Contrast Estimate Std. Error     z Pr(>|z|)  2.5 % 97.5 %
#   Sex Female - Male    -2.32     0.7614 -3.05   0.0023 -3.813 -0.829
#   age dY/dX             0.66     0.0616 10.72   <0.001  0.539  0.781
# 
# Columns: term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high

avg_comparisons(mod)
# 
#  Term      Contrast Estimate Std. Error     z Pr(>|z|)  2.5 % 97.5 %
#   Sex Female - Male    -2.32     0.7614 -3.05   0.0023 -3.813 -0.829
#   age +1                0.66     0.0616 10.72   <0.001  0.539  0.781
# 
# Columns: term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high

avg_predictions(mod, by = "Sex")
# 
#     Sex Estimate Std. Error    z Pr(>|z|) 2.5 % 97.5 %
#  Male       25.0      0.486 51.4   <0.001  24.0   25.9
#  Female     22.6      0.586 38.6   <0.001  21.5   23.8
# 
# Columns: Sex, estimate, std.error, statistic, p.value, conf.low, conf.high

DrJerryTAO commented 1 year ago

Excellent, @vincentarelbundock. Did not expect it to be so prompt.

The current lme support in marginaleffect works mostly okay except

Missing level= to be used in predict(). This affects whether the prediction is population mean or individual (cluster) mean. Adding the argument works (correctly for point estimate) but gives a warning. This could be passed by type="fixed/random" argument
Incorrect SE calculation: predictions() and all other functions do not differentiate between population SE and individual SE. It appears all SEs are population SEs.
Incorrect weights=~ treatment: predictions() and all other functions do not account for population variance changing by some predictors. SE calculation does not include heteroscedasticity estimates.
A strange point estimate in plot_comparisons() when both variables = c() and condition = c() are specified, which does not match any comparisons() result.
hypotheses() can only test fixed-effect parameters but not others like coefficients in the correlation structure and those in variance function.
hypotheses() can not accept df as a vector produced by insight::get_df(), which is common in multilevel modeling. DF of fixed-effect coefficients can be different between each other, depending on whether the variable varies within individuals or between individuals.

I also tested {ggeffects} given {insight} update. It has another series of errors and mistakes that I reported at https://github.com/strengejacke/ggeffects/issues/295. Nevertheless, it does differentiate between population SE and individual SE by type = "fixed" and "random".

Model set up. {insight} find and get seem to work well.

library(nlme)
library(insight)
library(marginaleffects)
data("Orthodont", package = "nlme")
ggplot( # spaghetti plot with LOESS group mean
  data = Orthodont, 
  aes(x = age, y = distance, group = Subject, color = Sex, fill = Sex)) + 
  geom_smooth(aes(
    group = Sex), alpha = 0.6, method = "loess", span = 1, show.legend = F) + 
  geom_line(size = 0.2, alpha = 0.6) +
  scale_color_manual(
    breaks = c("Female", "Male"), 
    values = c("#D14343", "#2197DF")) +
  scale_fill_manual(
    breaks = c("Female", "Male"), 
    values = c("#D14343", "#2197DF")) +
  scale_x_continuous(breaks = c(8, 10, 12, 14), minor_breaks = NULL) + 
  labs(x = "Age (year)", y = "Distance") + 
  theme_minimal() + 
  theme(
    strip.placement = "outside", 
    strip.text = element_text(size = rel(1)), 
    legend.title = element_blank(), 
    legend.position = c(0.1, 0.7), # "bottom"
    legend.direction = "vertical", legend.box = "vertical", 
    legend.margin = margin(0, 0, 0, 0), 
    legend.spacing = unit(0, "line"), 
    legend.background = element_blank(),
    legend.text = element_text(size = rel(0.8)),
    legend.key = element_blank(), 
    legend.key.size = unit(0.8, "line"),
    legend.key.width = unit(1.2, "line"),
    panel.grid.major.y = element_line(colour = "grey90", size = 0.2), 
    panel.grid.minor.y = element_line(colour = "grey90", size = 0.2), 
    panel.grid.major.x = element_line(colour = "grey90", size = 0.2), 
    panel.grid.minor.x = element_line(colour = "grey90", size = 0.2)
  )
summary(Model <- lme( # using all formula components
  distance ~ age, data = Orthodont, # fixed effect
  random = list(~ 1 + age | Subject), # correlated random int + age
  correlation = corAR1( # error correlation over time within Subject
    form = ~ as.integer(as.factor(age)) | Subject), # 1:4 as time index
  weights = varIdent(form = ~ 1 | Sex))) # heterosc: M/F has diff var
"       AIC      BIC    logLik
  436.6792 457.9867 -210.3396

Random effects:
 Formula: ~1 + age | Subject
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev    Corr  
(Intercept) 2.3152914 (Intr)
age         0.2402225 -0.619
Residual    1.5037380       

Correlation Structure: AR(1)
 Formula: ~as.integer(as.factor(age)) | Subject 
 Parameter estimate(s):
       Phi 
-0.2325323 

Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | Sex 
 Parameter estimates:
    Male   Female 
1.000000 0.401668 

Fixed effects:  distance ~ age 
                Value Std.Error DF t-value p-value
(Intercept) 17.115492 0.6384924 80 26.8061       0
age          0.624408 0.0617834 80 10.1064       0"
vcov(Model) # only among fixed effect coefficients
"            (Intercept)          age
(Intercept)   0.4599040 -0.024615698
age          -0.0246157  0.002237791"
find_predictors(Model)
'$conditional
[1] "age"
$correlation
[1] "age"     "Subject"'
find_predictors(Model, effects = "all", component = "all")
'$conditional
[1] "age"
$random
[1] "Subject"
$correlation
[1] "age"     "Subject"'
find_variables(Model)
'$response
[1] "distance"
$conditional
[1] "age"
$random
[1] "Subject"
$correlation
[1] "age"     "Subject"'
find_weights(Model) # "varIdent(form = ~1 | Sex)" before fix
'"Sex"'
class(find_weights(Model)) # "character"
names(get_data(Model)) # Sex left out before fix
'"distance" "age"      "Subject"  "Sex"'

predict.lme() as a reference. It has a level = c() argument to include or not random effects by clustering level.

# Prediction using nlme
Orthodont[Orthodont$Subject %in% c("M01", "F01"), ]
"Grouped Data: distance ~ age | Subject
   distance age Subject    Sex
1      26.0   8     M01   Male
2      25.0  10     M01   Male
3      29.0  12     M01   Male
4      31.0  14     M01   Male
65     21.0   8     F01 Female
66     20.0  10     F01 Female
67     21.5  12     F01 Female
68     23.0  14     F01 Female"
predict(Model, newdata = expand.grid(
  age = c(8, 10, 12, 14), Sex = c("Male", "Female")), 
  level = 0) # mean of population. No SE or CI option.
# Male has the same prediction as Female: Sex is only in weights
"[1] 22.11075 23.35957 24.60839 25.85720 22.11075 23.35957 24.60839
[8] 25.85720"
predict(Model, newdata = Orthodont[
  Orthodont$Subject %in% c("M01", "F01"), ], 
  level = 0) # same result as above
predict(Model, newdata = expand.grid(
  age = c(8, 10, 12, 14), Sex = c("Male", "Female")), 
  level = 1) # must supply cluster variables
"Error in predict.lme: 
cannot evaluate groups for desired levels on 'newdata'"
predict(Model, newdata = expand.grid( # level = max() by default
  age = c(8, 10, 12, 14), Sex = c("Male", "Female"))) # same error
predict(Model, newdata = Orthodont[
  Orthodont$Subject %in% c("M01", "F01"), ], 
  level = 1) # different when including random effects
"     M01      M01      M01      M01      F01      F01      F01 
24.77898 26.53948 28.29998 30.06047 20.10894 20.92936 21.74977 
     F01 
22.57018"
predict(Model, newdata = Orthodont[
  Orthodont$Subject %in% c("M01", "F01"), ], 
  level = 0:1) # level can be a vector, 0 for population
"   Subject predict.fixed predict.Subject
1      M01      22.11075        24.77898
2      M01      23.35957        26.53948
3      M01      24.60839        28.29998
4      M01      25.85720        30.06047
65     F01      22.11075        20.10894
66     F01      23.35957        20.92936
67     F01      24.60839        21.74977
68     F01      25.85720        22.57018"

{marginaleffect} results. See comments inside.

# Marginal effects using marginaleffects
predictions( # includes random effects (level=1) by default
  Model, newdata = Orthodont[ # individual mean but SE constant by Subject
  Orthodont$Subject %in% c("M01", "F01"), ]) # incorrect SE!!!
"Estimate Std. Error    z Pr(>|z|) 2.5 % 97.5 % age Subject    Sex
     24.8      0.394 63.0   <0.001  24.0   25.6   8     M01 Male  
     26.5      0.410 64.7   <0.001  25.7   27.3  10     M01 Male  
     28.3      0.460 61.5   <0.001  27.4   29.2  12     M01 Male  
     30.1      0.535 56.2   <0.001  29.0   31.1  14     M01 Male  
     20.1      0.394 51.1   <0.001  19.3   20.9   8     F01 Female
     20.9      0.410 51.1   <0.001  20.1   21.7  10     F01 Female
     21.7      0.460 47.3   <0.001  20.8   22.7  12     F01 Female
     22.6      0.535 42.2   <0.001  21.5   23.6  14     F01 Female"
predictions( # cannot predict individual mean without cluster variable
  Model, newdata = expand.grid( # this should be desired and expected
    age = c(8, 10, 12, 14), Sex = c("Male", "Female")))
"Error: Unable to compute predicted values with this model. You can try
  to supply a different dataset to the `newdata` argument. This
  error was also raised:
  cannot evaluate groups for desired levels on 'newdata'"
predictions( # warning for using level =, but argument worked
  Model, newdata = Orthodont[
    Orthodont$Subject %in% c("M01", "F01"), ], 
  level = 0) # gives population mean and ?perhaps population SE
"Estimate Std. Error    z Pr(>|z|) 2.5 % 97.5 % age Subject    Sex
     22.1      0.394 56.2   <0.001  21.3   22.9   8     M01 Male  
     23.4      0.410 57.0   <0.001  22.6   24.2  10     M01 Male  
     24.6      0.460 53.5   <0.001  23.7   25.5  12     M01 Male  
     25.9      0.535 48.4   <0.001  24.8   26.9  14     M01 Male  
     22.1      0.394 56.2   <0.001  21.3   22.9   8     F01 Female
     23.4      0.410 57.0   <0.001  22.6   24.2  10     F01 Female
     24.6      0.460 53.5   <0.001  23.7   25.5  12     F01 Female
     25.9      0.535 48.4   <0.001  24.8   26.9  14     F01 Female
Warning message:
These arguments are not supported for models of class `lme`: level. "
avg_predictions(Model) # random effect should average out; incorrect SE???
" Estimate Std. Error    z Pr(>|z|) 2.5 % 97.5 %
       24      0.431 55.6   <0.001  23.1   24.8"
avg_predictions(Model, level = 0) # same result as above; but warning
avg_predictions( # different means; but why? possibly an individual mean
  Model, by = "Sex") # with different random effects across ind by Sex
"    Sex Estimate Std. Error    z Pr(>|z|) 2.5 % 97.5 %
 Male       24.9      0.431 57.7   <0.001  24.0   25.7
 Female     22.7      0.431 52.5   <0.001  21.8   23.5"
avg_predictions( # same population mean by Sex
  Model, by = "Sex", level = 0) # but SE does not vary by Sex!!!
"    Sex Estimate Std. Error    z Pr(>|z|) 2.5 % 97.5 %
 Male         24      0.431 55.6   <0.001  23.1   24.8
 Female       24      0.431 55.6   <0.001  23.1   24.8
Warning message:
These arguments are not supported for models of class `lme`: level."
plot_predictions(Model, condition = "Sex") # equal CI by Sex. Incorrect!!!

comparisons(Model, by = T) # Sex in weights not included?
# correct to do so: does not affect mean
" Term Contrast Estimate Std. Error    z Pr(>|z|) 2.5 % 97.5 %
  age       +1    0.624     0.0618 10.1   <0.001 0.503  0.746"
avg_slopes(Model) # same result as above
comparisons( # two rows different due to random age effects differ by Sex
  Model, by = "Sex") # but equal SE by Sex. Incorrect!!!
" Term Contrast    Sex Estimate Std. Error     z Pr(>|z|) 2.5 % 97.5 %
  age mean(+1) Male      0.713     0.0618 11.53   <0.001 0.592  0.834
  age mean(+1) Female    0.496     0.0618  8.03   <0.001 0.375  0.617"
avg_slopes(Model, by = "Sex") # same result as above
comparisons( # same issue: equal SE by Sex. Incorrect!!!
  Model, by = "Sex", level = 0) # level = worked but warning
"Term Contrast    Sex Estimate Std. Error    z Pr(>|z|) 2.5 % 97.5 %
  age mean(+1) Male      0.624     0.0618 10.1   <0.001 0.503  0.746
  age mean(+1) Female    0.624     0.0618 10.1   <0.001 0.503  0.746
Warning message:
These arguments are not supported for models of class `lme`: level. "
avg_slopes(Model, by = "Sex", level = 0) # same as above
plot_comparisons( # plots comparisons(Model, by = "Sex"); incorrect equal CI 
  Model, variables = c("age"), by = c("Sex"))
plot_comparisons( # mean at ~0.88; neither population of individual
  Model, variables = c("age"), condition = c("Sex")) # what is it???

marginal_means(Model) # individual means; strange ordering
"    Term Value Mean Std. Error    z Pr(>|z|) 2.5 % 97.5 %
 Subject   M16 23.0      0.431 53.4   <0.001  22.2   23.9
 Subject   M05 23.1      0.431 53.6   <0.001  22.3   24.0
 ...
 Subject   F11 26.4      0.431 61.1   <0.001  25.5   27.2"
marginal_means(Model, level = 1) # same result as above with warning
# same SE for all Subject? Does not look right!!!
marginal_means(Model, level = 0) # equal population mean with warning
"    Term Value Mean Std. Error    z Pr(>|z|) 2.5 % 97.5 %
 Subject   M16   24      0.431 55.6   <0.001  23.1   24.8
 Subject   M05   24      0.431 55.6   <0.001  23.1   24.8
 ....
 Subject   F11   24      0.431 55.6   <0.001  23.1   24.8"

hypotheses(Model) # no test on other parameters, like correlation structure
# coefficient and heteroscedasticity in variance function
" Term Estimate Std. Error    z Pr(>|z|)  2.5 % 97.5 %
   b1   17.115     0.6385 26.8   <0.001 15.864 18.367
   b2    0.624     0.0618 10.1   <0.001  0.503  0.746"
hypotheses(Model, "age = 1") # worked
"    Term Estimate Std. Error     z Pr(>|z|)  2.5 % 97.5 %
 age = 1   -0.376     0.0618 -6.08   <0.001 -0.497 -0.254"
hypotheses(Model, "(Intercept) = 1") # could not refer to intercept
"object 'Intercept' not found"
hypotheses( # Error: Assertion on 'df' failed: Must have length 1
  Model, df = insight::get_df(Model)) # df can be coef-specific

vincentarelbundock commented 1 year ago

Thanks for the reports.

I removed the level warning.
I fixed the df bug.
Only fixed effects are taken into account when computing standard errors.
I don't understand why you expect the standard errors to be different if the weights are different but all fixed effects are identical (given the previous point).
Weights are not (and probably should not) be taken into account when computing standard errors.
By default, when by=TRUE marginaleffects does not aggregate by weights. This would be inappropriate and counter-intuitive in the vast majority of applications You can always do by="Sex" if that's what you want.
You need to use backticks when there are non-alpha-numeric characters like:

hypotheses(Model, "`(Intercept)` = 1")

DrJerryTAO commented 1 year ago

Thank you for the update.

Only fixed effects are taken into account when computing standard errors.

I don't think this is theoretically true. It depends one whether the prediction is for a population mean (fixed effect SD) or cluster mean (both fixed and random effect SD), corresponding to a confidence interval vs. prediction interval.

I don't understand why you expect the standard errors to be different if the weights are different but all fixed effects are identical (given the previous point). Weights are not (and probably should not) be taken into account when computing standard errors. By default, when by=TRUE marginaleffects does not aggregate by weights.

Weights in nlme::gls() and lme() may be defined very differently from those in other packages: both model the heteroscedasticity form to allow population SE changes with some predictors, and estimate coefficients in the variance effect; therefore, both confidence interval and prediction interval should change in width according to the variance function in weight = varFunc() if the predictor changes, not just in observed sample, but also in counterfactual predictions. See examples in lme() https://github.com/easystats/insight/issues/727#issuecomment-1466386234.

In contrast, ggeffect() takes an incorrect approach to allow population SE to vary by some predictor as in the very last execution in https://github.com/strengejacke/ggeffects/issues/295#issuecomment-1466028273 where CI widens with age, but age has only a linear fixed and random effect on distance. Not sure how these age-varying SE and CI were calculated, but they look incorrect: weight = varFunc() coefficient estimates say that population SE and CI width should reduce with age in a power function.

vincentarelbundock commented 1 year ago

Thanks for the clarification.

I don't know these model objects well enough to implement a whole new infrastructure to support all this.

In your opinion, would it be sufficient to:

Document that the estimates are population means.
Raise an error when weights are used because they are not supported.

DrJerryTAO commented 1 year ago

Document that the estimates are population means. Since you allow level = , point estimates can be either population means (level = 0) or cluster means (level = 1, 2, ...)

Raise an error when weights are used because they are not supported.

Possibly, but probably better to issue a warning instead because the SE the current infrastructure already got should be quite helpful. See predict.lm() help file with weights argument. It allows user to specify a numeric vector or formula to apply on new data. Warning is printed in certain cases.

I don't know these model objects well enough to implement a whole new infrastructure to support all this.

The most useful resource I have found so far is https://fw8051statistics4ecologists.netlify.app/gls.html#Sockeye It appear what you have now are most likely population SE (fe_se) and CI of population mean, which depend only on the vcov() of fixed-effect coefficients and fixed-effect variables in new data.

To make a CI with random effects, so it measures the CI of cluster means (wider than that of the population mean), enlarge the SE to be re_se = sqrt(fe_se^2 + re_sd^2); where re_sd should be derived from Model$modelStruct$reStruct, but I don't know how. Random effects at different levels and of different predictors can be separated, so there could be multiple versions of re_sd to select from.

Within each cluster, the error term is allowed to correlate through correlation = corStruct. For example, in longitudinal data where cluster is each Subject, correlated errors between multiple measurements of the same Subject can be via corAR1(). Although it only gives a correlation coefficient, it matters to SE or CI calculation as in time series analysis. I don't know how this should affect SE and CI at which level: I suspect that fixed-effect SE/CI is not affected, but I am not sure about random-effect SE/CI.

To make a prediction interval with observation-level uncertainty, so it measures the CI of possible Y (random-effects, cluster mean CI), further enlarge the SE to be obs_se = sqrt(fe_se^2 + re_sd^2 + (sigma*varFunc())^2). Model$sigma has the residual SE directly, but there are many types of weights = varFunc() using different link functions as shown https://fw8051statistics4ecologists.netlify.app/gls.html#variance-increasing-with-x_i. Model$modelStruct$varStruct and Model$apVar contain the information for varFunc but may have strange parametrization. At observation level, within-cluster error correlation definitely matters, but I don't know how SE and CI are a function of error correlation coefficients.

vincentarelbundock commented 1 year ago

Awwwww, I think I understand now!

You are not saying that the standard errors are statistically incorrect. What you are saying is that these are not the quantities you want.

To be clear:

marginaleffects estimates confidence intervals and not prediction intervals (except for brms, where it can do both).
The CIs for lme models should be given a "population" interpretation because they are constructed only based on (a) the variance-covariance matrix of the fixed effects and (b) a jacobian of derivatives taken with respect to fixed effects coefficients.

So yes, you are right that the intervals do not account for the random effects structure. This is known.

In fact, you should know that this is the same approach that we use everywhere in the package, including for lme4 models.

For now, prediction intervals are out of scope, as are confidence intervals that account for the covariance structure of REs.

DrJerryTAO commented 1 year ago

marginaleffects estimates confidence intervals and not prediction intervals In fact, you should know that this is the same approach that we use everywhere in the package, including for lme4 models.

I have not tried marginaleffects with lmer() models. What do you think it indicates if I use population mean standard errors for cluster mean predictions? i.e., in predictions(lme(), level = 1, 2, ...) for cluster mean estimates with SE equal to those of population mean estimates. The strategy ggeffects take in ggemmeans() and ggeffect()

type = "random" still returns population-level predictions, however, unlike type = "fixed", intervals also consider the uncertainty in the variance parameters (the mean random effect variance, see Johnson et al. 2014 for details) and hence can be considered as prediction intervals... To get predicted values for each level of the random effects groups, add the name of the related random effect term to the terms-argument (for more details, see this vignette).

Regarding comparisons() and slopes() with level = 0 vs level = 1, 2, ... ggeffect() for marginal effects currently do not have a type = "fixed" "random" argument. Does that mean marginal effects should not need bother the distinction between those of fixed effects and those including random effects? What does it indicate if I use population standard errors for cluster mean comparisons and slopes with level = 1?

vincentarelbundock commented 1 year ago

If you want to really nail down the correct interpretation, I recommend you talk to a local lme expert and/or statistician.

vincentarelbundock / marginaleffects

Support: `lme` #99

Working

Not working

Example