rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
341 stars 30 forks source link

Metafor rma.mv() degree of freedom question #422

Closed mahonmb closed 1 year ago

mahonmb commented 1 year ago

I am currently conducting a mixed effects meta-analysis using the rma.mv() function in the metafor package. Following the fit of the rma.mv(), I am conducting and running cluster-robust tests and confidence intervals with updated variance-covariance matrix using the robust() function in the metafor package with the "CR2" estimator using the clubSandwich package, which uses Satterwaite approximation for degrees of freedom. When I use the qdrg() and emmeans() functions on the output from the robust() function, the df are all set to INF, instead of the approximated dfs, which skews the confidence intervals to be larger than those estimated from the robust() function. This is an issue when conducting post-hoc Tukey's test comparisons.

I have tried to set the df in the qdrg object equal to those from the robust output and even adding an "df.residual" object to the robust output that matches the approximated dfs, but I get an error (Error in format.default(x[-seq_len(2)], width = w, justify = "right") : invalid 'width' argument) when I use the emmeans() function.

Is there a way to force different dfs for each level in an emmGrid object, in order to get the cluster-robust confidence intervals in the output of emmeans? Similar to the df approximation for lmer objects [from your vignette: emmeans(Oats.lmer, "nitro", lmer.df = "satterthwaite")]. This may not be an issue for most levels within my moderator variables, but for several moderators, the DF will be very different among levels and some may be very low (thereby creating artificially low CIs for those levels).

Example code library(metafor); library(clubSandwich)

generate effect size dataset

dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)

conduct mixed effects meta-analytic model

m1 <- rma.mv(yi, vi, mods = ~ alloc, random = ~ 1 | trial, data=dat)

conduct cluster-robust variance estimation with updated variance-covariance and Satterwaite DF approximation

m1o <- robust(m1, cluster=trial, clubSandwich=TRUE) m1o

Truncated results

Model Results:

. estimate se¹ tval¹ df¹ pval¹ ci.lb¹ ci.ub¹

intrcpt -0.5180 0.2838 -1.8249 1 0.3191 -4.1243 3.0884

allocrandom -0.4478 0.4005 -1.1183 1.82 0.3891 -2.3402 1.4446

allocsystematic 0.0890 0.4547 0.1958 2.28 0.8608 -1.6524 1.8305

---

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

.

. 1) results based on cluster-robust inference (var-cov estimator: CR2,

. approx t/F-tests and confidence intervals, df: Satterthwaite approx)

generate emmgrid object

m1oem <- (qdrg(object = m1o, data = dat))

run emmeans on emmgrid object

emmeans(m1oem, ~ alloc)

. alloc emmean SE df asymp.LCL asymp.UCL

. alternate 0.000 0.000 Inf 0.000 0.000

. random -0.448 0.400 Inf -1.233 0.337

. systematic 0.089 0.455 Inf -0.802 0.980

.

Confidence level used: 0.95

note vastly different CIs between emmeans and robust objects

Attempt to add approximated DF to the robust object

add "df.residuals" object to the robust and generate emmgrid object

m1o2 = m10 m1o2$df.residuals = (m1o2$dfs) m1oem2 <- (qdrg(object = m1o2, data = dat))

try to run emmeans on emmgrid object

emmeans(m1oem2, ~ alloc)

output:

Error in format.default(x[-seq_len(2)], width = w, justify = "right") : invalid 'width' argument

rvlenth commented 1 year ago

It is possible to specify df = in a call to emmeans(), but currently that is restricted to specifying a single value that is used for all the d.f. To have different df for each estimate, you would have to replace the dffun and dfargs slots of your m1oem2 object (see the help for emmGrid-object). It would be tricky, as you would have to do some kind of bookkeeping to keep track of which estimate is being computed. (Either that or actually code the Satterthwaite approximation itself based on the linear function being estimated.) Note that dffun is not expected to be vectorized; the code for summarizing the emmGrid object calls it for each estimate.

Also note that the name of the member that holds degrees of freedom is object$df.residual, not object$df.residuals. But setting object$df.residual to a vector would not work.

rvlenth commented 1 year ago

Just a small additional comment. If all of the df are more that 35 or so, gdtting the df right hardly affects the CI widths or P values. You could specify the minimum of all the df and that would be the conservative route.

mahonmb commented 1 year ago

Thank you for your quick and insightful respnse, @rvlenth. All of my approximated df are above 50. Comparing the CIs from the emmeans output and that of the robust() function, the most that they are off is 0.007, which seems quite negligible. I may need to use the conservative route that you mention if I run into interactions with <30 df.

rvlenth commented 1 year ago

Just a comment. Please note that to compare things on an even keel, we need to use the same contrasts as are implied by the coefficient estimates, and with no multiplicity adjustments.

> m1o

Multivariate Meta-Analysis Model (k = 13; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed  factor 
sigma^2    0.3615  0.6013     13     no   trial 

Test for Residual Heterogeneity:
QE(df = 10) = 132.3676, p-val < .0001

Number of estimates:   13
Number of clusters:    13
Estimates per cluster: 1

Test of Moderators (coefficients 2:3):¹
F(df1 = 2, df2 = 2.4) = 0.6505, p-val = 0.5947

Model Results:

                 estimate      se¹     tval¹    df¹    pval¹    ci.lb¹   ci.ub¹ 
intrcpt           -0.5180  0.2838   -1.8249      1   0.3191   -4.1243   3.0884  
allocrandom       -0.4478  0.4005   -1.1183   1.82   0.3891   -2.3402   1.4446  
allocsystematic    0.0890  0.4547    0.1958   2.28   0.8608   -1.6524   1.8305  

---
1) results based on cluster-robust inference (var-cov estimator: CR2,
   approx t/F-tests and confidence intervals, df: Satterthwaite approx)

Here are the comparable contrasts:

> m1oem <- (qdrg(object = m1o, data = dat))
> m1oemm <- emmeans(m1oem, ~ alloc)
> summary(contrast(m1oemm, "trt.vs.ctrl1"), infer = c(TRUE, TRUE), adjust = "none")
 contrast               estimate    SE  df asymp.LCL asymp.UCL z.ratio p.value
 random - alternate       -0.448 0.400 Inf    -1.233     0.337  -1.118  0.2634
 systematic - alternate    0.089 0.455 Inf    -0.802     0.980   0.196  0.8448

Confidence level used: 0.95 

Using the minimum df:

> summary(contrast(m1oemm, "trt.vs.ctrl1"), infer = c(TRUE, TRUE), adjust = "none", df = 1.82)
 contrast               estimate    SE   df lower.CL upper.CL t.ratio p.value
 random - alternate       -0.448 0.400 1.82    -2.34     1.45  -1.118  0.3894
 systematic - alternate    0.089 0.455 1.82    -2.06     2.24   0.196  0.8644

Degrees-of-freedom method: user-specified 
Confidence level used: 0.95 

Note that the results above are fairly comparable to the last 2 rows of the m1o summary.

Just copying the df from the robust() summary into the reference grid would not work anyway. We would need to carry out the correct Satterthwaite calculations for each linear function that is estimated. Presumably, the necessary information is incorporated in the m1o object, but it's likely to be a nontrivial undertaking to get it right.