rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
340 stars 30 forks source link

Would it be possible to allow the "df" parameter to accept a vector of values rather than just a single scalar? #461

Closed Generalized closed 6 months ago

Generalized commented 6 months ago

Let me illustrate the question with robust lme4 (robustlmm) and emmeans.

Below I fit the model and it tells me that there are 91 observations and 44 clusters. Unfortunately, it won't tell me the number of degrees of freedom. Therefore, emmeans assumes infinite number, using the normal approximation to t distribution.

In this very case, one could take various common approaches:

And some people actually did this already: Geniole Shawn N., Proietti Valentina, Bird Brian M., Ortiz Triana L., Bonin Pierre L., Goldfarb Bernard, Watson Neil V. and Carré Justin M. 2019. Testosterone reduces the threat premium in competitive resource division. 286. Proceedings of the Royal Society B. http://doi.org/10.1098/rspb.2019.0720

Robust mixed-level models were used, including random intercepts and random slopes for the highest order interactions and/or the main effects not captured by these interactions. Participant and stimulus ID were grouping factors. Significance was determined using Satterthwaite approximations of degrees of freedom, limiting Type I error inflation but maintaining power.

And it seems quite reasonable approach (better than residual DF or Infinity). But the Satterthwaite dfs can vary from group to group. The "df" argument of emmeans takes only a single (first) value. Would it be possible to allow for providing a vector of dfs?

Let's quickly demonstrate it on some data:

> summary(m_rlmm <- robustlmm::rlmer(formula = TiffeneauPostDiff ~ Timepoint + (1|PatientId), data=d1, control = lmerControl(check.nobs.vs.nRE = "warning")))
Robust linear mixed model fit by DAStau 
Formula: TiffeneauPostDiff ~ Timepoint + (1 | PatientId) 
   Data: d1 
Control: lmerControl(check.nobs.vs.nRE = "warning") 

......cut......

then:

> emmeans(m_rlmm , specs = ~Timepoint)
 Timepoint    emmean     SE  df asymp.LCL asymp.UCL
 Week 8    -0.000403 0.0116 Inf   -0.0231    0.0223
 Week 16    0.012469 0.0116 Inf   -0.0103    0.0352
 Week 56    0.011229 0.0127 Inf   -0.0136    0.0361

Confidence level used: 0.95 
> emmeans(m_rlmm , specs = ~Timepoint, df=44)
 Timepoint    emmean     SE df lower.CL upper.CL
 Week 8    -0.000403 0.0116 44  -0.0238   0.0230
 Week 16    0.012469 0.0116 44  -0.0109   0.0359
 Week 56    0.011229 0.0127 44  -0.0143   0.0368

Degrees-of-freedom method: user-specified 
Confidence level used: 0.95 

OK, now let's employ the lme4:

> m_lmm <- lme4::lmer(formula = TiffeneauPostDiff ~ Timepoint + (1|PatientId), data=d1, control = lmerControl(check.nobs.vs.nRE = "warning"))
> emmeans(m_lmm , specs = ~Timepoint, mode = "Satt")
 Timepoint    emmean     SE   df lower.CL upper.CL
 Week 8    -0.000075 0.0262 45.9  -0.0528   0.0526
 Week 16    0.016333 0.0262 45.9  -0.0364   0.0690
 Week 56    0.010448 0.0267 49.3  -0.0432   0.0641

Degrees-of-freedom method: satterthwaite 
Confidence level used: 0.95 

but when I want to reuse the dfs, only the first value is taken:

> emmeans(m_rlmm , specs = ~Timepoint, df=as.data.frame(emmeans(m_lmm , specs = ~Timepoint, mode = "Satt"))$df)
 Timepoint    emmean     SE   df lower.CL upper.CL
 Week 8    -0.000403 0.0116 45.9  -0.0237   0.0229
 Week 16    0.012469 0.0116 45.9  -0.0109   0.0358
 Week 56    0.011229 0.0127 45.9  -0.0143   0.0368

Degrees-of-freedom method: user-specified 
Confidence level used: 0.95 
d1 <- structure(list(PatientId = c("XXY1YY2", "XXY1YY2", "XXY1YY3", 
"XXY1YY3", "XXY1YY3", "XXY1YY5", "XXY1YY5", "XXY1YY5", "XXY1YY7", 
"XXY1YY7", "XXY1YY7", "XXY1YY9", "XXY1YY9", "XXY1YY9", "XXY2YY1", 
"XXY2YY1", "XXY2YY2", "XXY2YY2", "XXY2YY2", "XXY2YY5", "XXY2YY6", 
"XXY2YY8", "XXY2YY8", "XXY2Y1Y", "XXY2Y11", "XXY2Y11", "XXY2Y14", 
"XXY2Y15", "XXY2Y17", "XXY2Y19", "XXY2Y19", "XXY2Y2Y", "XXY2Y22", 
"XXY2Y22", "XXY2Y23", "XXY2Y25", "XXY2Y25", "XXY4YY1", "XXY4YY1", 
"XXY4YY2", "XXY6YY1", "XXY6YY1", "XXY6YY1", "XXY6YY2", "XXY6YY3", 
"XXY6YY3", "XXY6YY3", "XXY6YY4", "XXY6YY4", "XXY6YY4", "XXY6YY5", 
"XXY6YY5", "XXY6YY8", "XXY6YY8", "XXY8YY1", "XXY8YY1", "XXY8YY1", 
"XXY8YY2", "XXY8YY2", "XXY8YY4", "XXY8YY4", "XXY8YY4", "XXY8YY5", 
"XXY8YY5", "XXY8YY5", "XXY8YY6", "XXY8YY6", "XXY8YY6", "XXY8YY7", 
"XXY8YY8", "XXY8YY8", "XXY8YY8", "XXY9YY1", "XXY9YY1", "XXY9YY2", 
"XXY9YY2", "XXY9YY3", "XXY9YY3", "XXY9YY3", "XXY9YY4", "XXY9YY4", 
"XXY9YY4", "XX1YYY3", "XX1YYY3", "XX13YY1", "XX13YY1", "XX13YY3", 
"XX13YY5", "XX13YY5", "XX17YY1", "XX17YY1"), TiffeneauPostDiff = c(0.019, 
0.0452, -0.0033, 0.0154, -0.0174, 0.0284, 0.0802, 0.0036, -0.7336, 
-0.6917, -0.736, 0.0092, 0.0581, -0.0113, 0.0138, 0.0144, -0.0499, 
0.024, -0.0299, -0.0895, 0.0541, 0.0925, 0.0379, 0.0139, 0.0171, 
0.0393, -0.1249, -0.022, -0.0639, 0.0134, -0.0739, -0.0711, 0.1844, 
0.0563, -0.0012, 0.0909, 0.1099, -0.0186, 0.0578, 0.0184, -0.0258, 
-0.041, -0.0347, -0.0263, -0.0462, 0.0241, -0.0734, -0.0139, 
-0.0342, 0.039, 0.1898, 0.2037, -0.0297, 0.0076, 0.0316, 0.0301, 
0.0893, -0.039, -0.0976, 0.0062, 0.03, 0.0323, -0.0457, -0.0463, 
0.0014, -0.0591, -0.0545, -0.0133, 0.0427, 0.1439, 0.083, 0.1465, 
-1e-04, 0.0135, -0.0425, -0.0235, 0.0811, 0.0524, 0.0596, 0.088, 
0.1404, 0.1102, -0.1419, -0.0881, 0.0023, 0.1131, 0.7371, 0.0254, 
0.0394, -0.0389, -0.0518), Timepoint = structure(c(1L, 2L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 
3L, 1L, 2L, 3L, 1L, 1L, 3L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 3L, 1L, 
2L, 3L, 1L, 3L, 1L, 1L, 2L, 3L, 2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 1L, 3L, 1L, 1L, 2L, 2L, 3L), levels = c("Week 8", "Week 16", 
"Week 56"), class = "factor")), row.names = c(NA, -91L), Version = 5L, Date = structure(1698663319.543, tzone = "UTC", class = c("POSIXct", 
"POSIXt")), class = "data.frame")
rvlenth commented 6 months ago

[I have revised this comment since originally writing it.]

First, I'll try to answer the question in the headline...

Would it be possible to allow the "df" parameter to accept a vector of values rather than just a single scalar?

My answer its that it is possible, but not advisable. Believe me, I have thought about this before. The df argument is passed to ref_grid(), so if you were allowed to give it a vector of numbers, that list would have to make sense at the time the reference grid is constructed, which could very likely differ from what the user is thinking of when calling, say, emmeans(). And even if the user understood that, they would have to correctly anticipate the order of the reference-grid notes; so allowing this would be a pretty fragile user interface.

Second, you are incorrect in characterizing emmeans as making model assumptions. It is the emm_basis method for the model class that determines how d.f. calculations are made, via the @dffun and @dfargs slots. This question is particular to a model object that is supported by the robustlmm package -- not emmeans. It seems appropriate to ask the robustlmm developer to consider improving the d.f. support that is provided.

Generalized commented 6 months ago

Thank you very much for the explanation!

I thought that the vector of df could be actually a data frame constructed in a way that reflects the combinations of factors in the grid, so the user provides a data.frame (for instance) like this:

X   Y    df
X1 Y1 10
X1 Y2 15.5
X2 Y1 16
X2 Y2 30
....

Whenever something is missing - "Inf" could be imputed with a warning, or error thrown: "missing df for the following elements of the grid". Whether it's valid or not would be up to the user. I think that people advanced enough to specify their own df will be aware and care of it by tracking the output and whether it matches their specification.

I don't think authors of robustlmm will do that, because it's a non-standard approach. I find it sensible, my more advanced colleagues at work (with whom I consulted this issue) would follow it as well, but this is not something that is "formally supported", even if sounds reasonable in certain cases.

OK, I will try either to "intercept" the code in the robustlmm code which returns the df (Inf) and "inject" mine, or take the test statistic and do the pnorm() calculation based on the specified df on my own, overwriting the output of emmeans.

Thank you very much for all your efforts and let me wish you all the best in 2024!

rvlenth commented 6 months ago

Thanks for the new year wishes -- and the same back to you!

Regarding d.f., I don't think it's anything like you describe. What's needed for the @dffcn slot is a function that will determine the d.f. for estimating a linear function $\hat\theta = \sum c_j \hat \beta_j$ where $\hat\theta$ is one of the EMMs or a contrast thereof, the $c_j$ are given constants and the $\hat\beta_j$ are the regression coefficients for the model. The Satterthwaite d.f., for example, is based on the relationship between $Var(\hat\theta)$ and $E(\hat\theta^2)$. and the "containment" method used for lme models is the d.f. associated with the coarsest grouping associated with nonzero $c_j$ values. In neither case are we doing bookkeeping on the observations in the dataset, but rather the role of groupings on the $\hat\beta_j$ and the weights given them in estimating $\theta$.

Generalized commented 6 months ago

Dear @rvlenth Thank you for the response!

I could obtain Satterthwaite "for free" from lme4 and provide here. Yes, I do now that both estimators (coefficients, variances) will be different, as the estimation was different and weighting was employed, so anything based on that will be off (a little or a lot). In my current case I think they won't be that far from the original one (maybe in edge cases). Above N=30 the difference between z and t isn't that big, so the df itself shouldn't be critical unless above 20, and I rarely deal below it. It could be (like in my case) something between 40-50 and it would be acceptable. obraz

I remember from one vignette (about effect size) that you took a similar approach by mentioning it's a non-critical thing and provided some range.

Actually I could even live with Inf, as the differences won't be too big, but my reviewers won't accept it. They will say: "OK, they are close, but don't make it bigger naively, make it closer to reasonable number". And yes, I could use just average of the dfs (still better than Inf), but they will complain it's the same value... even if it doesn't actually matter too much.

And I can do a little or nothing with that.

PS: I found that sometimes emmeans assigns the infinity, for example in emmeans:::emm_basis.merMod

 mode = match.arg(tolower(mode), c("satterthwaite", "kenward-roger", 
            "asymptotic"))

        if (!is.null(options$df) && (mode != "kenward-roger")) 
            mode = "asymptotic"

..... and later ....

if (mode == "asymptotic") {
            dffun = function(k, dfargs) Inf
        }

Which makes sense, as robustlmm doesn't provide itself the df and p-value:

> summary(m_mmrm)
Robust linear mixed model fit by DAStau 
Formula: frm 
   Data: data 
Control: lmerControl(check.nobs.vs.nRE = "warning") 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.82765 -0.48266 -0.00967  0.36703  3.00827 

Random effects:
 Groups    Name        Variance Std.Dev.
 PatientId (Intercept) 0.003911 0.06253 
 Residual              0.001147 0.03386 
Number of obs: 91, groups: PatientId, 44

Fixed effects:
                   Estimate Std. Error t value
(Intercept)      -0.0004035  0.0115931  -0.035
TimepointWeek 16  0.0128727  0.0092463   1.392
TimepointWeek 56  0.0116329  0.0104089   1.118

And that's why I thought you set it in some scenarios, as the last-resort option.

By the way, let's check in this very case: lmerModLmerTest (selected Satterthwaite)

> coef(summary(m_mmrm1))
                      Estimate  Std. Error       df      t value   Pr(>|t|)
(Intercept)      -7.504107e-05 0.026180493 45.90616 -0.002866297 0.99772545
TimepointWeek 16  1.640811e-02 0.009221525 45.53196  1.779327325 0.08186167
TimepointWeek 56  1.052267e-02 0.010355144 45.41416  1.016177970 0.31492800

with infinite df; robustlmm; There differences as there are a few outliers that are weighted-down.

> as.data.frame(coef(summary(m_mmrm))) %>% mutate(df = Inf, p = 2*pt(abs(`t value`), df, lower=FALSE))
                      Estimate  Std. Error     t value  df         p
(Intercept)      -0.0004034906 0.011593143 -0.03480424 Inf 0.9722358
TimepointWeek 16  0.0128727105 0.009246332  1.39219640 Inf 0.1638629
TimepointWeek 56  0.0116329400 0.010408877  1.11759804 Inf 0.2637387

df=10

> as.data.frame(coef(summary(m_mmrm))) %>% mutate(df = 10, p = 2*pt(abs(`t value`), df, lower=FALSE))
                      Estimate  Std. Error     t value df         p
(Intercept)      -0.0004034906 0.011593143 -0.03480424 10 0.9729208
TimepointWeek 16  0.0128727105 0.009246332  1.39219640 10 0.1940460
TimepointWeek 56  0.0116329400 0.010408877  1.11759804 10 0.2898692

or even with the calculated Satt. df

> as.data.frame(coef(summary(m_mmrm))) %>% mutate(df = coef(summary(m_mmrm1))[,"df"], p = 2*pt(abs(`t value`), df, lower=FALSE))
                      Estimate  Std. Error     t value       df         p
(Intercept)      -0.0004034906 0.011593143 -0.03480424 45.90616 0.9723867
TimepointWeek 16  0.0128727105 0.009246332  1.39219640 45.53196 0.1706257
TimepointWeek 56  0.0116329400 0.010408877  1.11759804 45.41416 0.2696163

How can I pass it through dffun? I tried:

m_mmrm_em@dffun <- function() coef(summary(m_mmrm1))[,"df"]

but it says:

> m_mmrm_em
Error in object@dffun(x, object@dfargs) :
unused arguments (x, object@dfargs)

I'm sorry, I've never played with the object system in R :(

rvlenth commented 6 months ago

d.f. do matter when they are small, and that can happen easily in mixed models, because it's the numbers of groups and subgroups and such that come into play.

I also point out that reviewers are not always right, and sometimes it's important to talk back to them, with objective support of course. I would suggest that it's probably appropriate to not exceed the d.f. for a parallel non-robust model, because robustifying things generally leads to loss of efficiency. Most robust methods yield a set of case weights as part of their calculations, and using the d.f. resulting from refitting the model with those weights, and using a non-robust methods like lmer() seems fairly defensible.

However, I'm not going to wade into this any further. There are a lot of technical details and I want to prioritize other matters. If you want to know more about how the slots are used, you can look at existing code like emmeans:::emm_basis.merMod, at documentation for extending emmeans, and examples for emmobj.

Generalized commented 6 months ago

PS: I think I found a workaround :)

I just fit robustlmm, took the weights and... supplied them to the lme4 back.

> m_mmrm_lmer_rlmer <- lme4::lmer(formula = frm, data=data, control = lmerControl(check.nobs.vs.nRE = "warning"), weights = getME(m_mmrm_rlmer "w_e"))

> # lme4 with weights obtained from robustlmm
> coef(summary(m_mmrm_lmer_rlmer))
                      Estimate  Std. Error     t value
(Intercept)      -0.0006501447 0.026083449 -0.02492557
TimepointWeek 16  0.0159041971 0.008826363  1.80189703
TimepointWeek 56  0.0116748979 0.009936630  1.17493532

> # robustlmm itself
> coef(summary(m_mmrm_rlmer))
                      Estimate  Std. Error     t value
(Intercept)      -0.0004034906 0.011593143 -0.03480424
TimepointWeek 16  0.0128727105 0.009246332  1.39219640
TimepointWeek 56  0.0116329400 0.010408877  1.11759804

> # lme4 itself
> coef(summary(m_mmrm_lmer))
                      Estimate  Std. Error      t value
(Intercept)      -7.504107e-05 0.026180493 -0.002866297
TimepointWeek 16  1.640811e-02 0.009221525  1.779327325
TimepointWeek 56  1.052267e-02 0.010355144  1.016177970

AND

> emmeans(m_mmrm_lmer_rlmer, specs = ~Timepoint, mode = "satterthwaite")
 Timepoint   emmean     SE   df lower.CL upper.CL
 Week 8    -0.00065 0.0261 45.6  -0.0532   0.0519
 Week 16    0.01525 0.0261 45.8  -0.0373   0.0678
 Week 56    0.01102 0.0266 48.9  -0.0424   0.0644

Degrees-of-freedom method: satterthwaite 
Confidence level used: 0.95 

> emmeans(m_mmrm_lmer, specs = ~Timepoint, mode = "satterthwaite")
 Timepoint    emmean     SE   df lower.CL upper.CL
 Week 8    -0.000075 0.0262 45.9  -0.0528   0.0526
 Week 16    0.016333 0.0262 45.9  -0.0364   0.0690
 Week 56    0.010448 0.0267 49.3  -0.0432   0.0641

Degrees-of-freedom method: satterthwaite 
Confidence level used: 0.95 

> emmeans(m_mmrm_rlmer, specs = ~Timepoint)
 Timepoint    emmean     SE  df asymp.LCL asymp.UCL
 Week 8    -0.000403 0.0116 Inf   -0.0231    0.0223
 Week 16    0.012469 0.0116 Inf   -0.0103    0.0352
 Week 56    0.011229 0.0127 Inf   -0.0136    0.0361

Confidence level used: 0.95 

Weights: obraz

It costs me double fitting, but that's not a big issue. If we assume that these weights are the final product of interest, putting them into lme4 makes sense and I get all the "pleasures and benefits" of this well-featured package.

The results are worse for lmer + weights, although visible. Weighting helped. obraz

For robust lmer it was more visible: obraz

Thank you for all help! I am going to end this discussion, but I just wanted to share the final results in case anyone might need it in future.