rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
348 stars 31 forks source link

Adding the following function for pooling ready emmeans objects to the emmeans package #502

Closed Generalized closed 1 month ago

Generalized commented 2 months ago

With regard to my other issue, where I described the case with multiply imputed data, I took the code from your package and changed it a little to allow for pooling emmeans rather than analysis objects.

This is especially helpful when we need to do a pooled analysis using some special features of the used modelling package, not handled out of the box by emmeans.

The function - adaptation of your emmeans:::emm_basis.mira code:

pool_emmeans <- function(emmeans_list) {
  bas = emmeans_list[[1]]
  k = length(emmeans_list)
  V = 1/k * bas@V
  allb = cbind(bas@bhat, matrix(0, nrow = length(bas@bhat), ncol = k - 1))

  for (i in 1 + seq_len(k - 1)) {
    basi = emmeans_list[[i]]
    V = V + 1/k * basi@V
    allb[, i] = basi@bhat
  }
  bas@bhat = apply(allb, 1, mean)
  notna = which(!is.na(bas@bhat))
  bas@dfargs$m = k
  bas@dfargs$df1 = bas@dffun
  bas@dfargs$B = cov(t(allb[notna, , drop = FALSE]))
  bas@V = bas@dfargs$T = V + (k + 1)/k * bas@dfargs$B
  bas@dffun = function(a, dfargs) {
    dfcom = dfargs$df1(a, dfargs)
    with(dfargs, {
      b = sum(a * (B %*% a))
      t = sum(a * (T %*% a))
      lambda = (1 + 1/m) * b/t
      dfold = (m - 1)/lambda^2
      dfobs = (dfcom + 1)/(dfcom + 3) * dfcom * (1 - lambda)
      ifelse(is.infinite(dfcom), dfold, dfold * dfobs/(dfold + dfobs))
    })
  }
   return(bas)
}

An exemplary use:

> d <- structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 
3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 
9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 13L, 
13L, 13L, 14L, 14L, 14L), levels = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14"), class = "factor"), 
Arm = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L), levels = c("A", "B"), class = "factor"), Visit = structure(c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("V1", 
"V2", "V3"), class = c("ordered", "factor")), Visit_non_ord = structure(c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("V1", 
"V2", "V3"), class = "factor"), Painscore = c(1L, 2L, 4L, 
5L, 6L, NA, 4L, 2L, 3L, 4L, 3L, 4L, 5L, NA, 5L, 6L, 7L, 6L, 
4L, 3L, 5L, 1L, NA, 4L, 5L, 4L, 5L, 4L, 6L, NA, 0L, 4L, 3L, 
4L, NA, 4L, 3L, 4L, NA, 6L, 7L, 8L)), row.names = c(NA, -42L
), class = "data.frame")

> library(mice)
> library(glmtoolbox)

> imp <- mice(d, m=5)
> imp$predictorMatrix[1:4, 1:5] <- 0
> imp$predictorMatrix[5, 1] <- 0
> imp <- mice(d, m=5, predictorMatrix = imp$predictorMatrix)
> imp_list <- complete(imp, "all")

> trend_emms <- lapply(imp_list, 
                    function(dat) {
                        m <- glmgee(Painscore ~ Visit * Arm, 
                                    family = gaussian(link = "identity"), 
                                    id = ID,  
                                    data=dat, 
                                    corstr = "unstructured")

                        emmeans(m, 
                                specs = ~ Visit | Arm,
                                adjust="none", 
                                vcov =  vcov(m, type = "bias-corrected"))   # this is not accessible from within emmeans
                      })

> pooled_ems <- pool_emmeans(trend_emms)
> contrast(pooled_ems, "poly", max.degree=2, by = "Arm")

Arm = A:
 contrast  estimate    SE    df t.ratio p.value
 linear        0.40 0.753 16.10   0.531  0.6026
 quadratic     1.03 1.168 17.48   0.881  0.3903

Arm = B:
 contrast  estimate    SE    df t.ratio p.value
 linear        1.40 0.752 13.01   1.861  0.0855
 quadratic    -1.74 1.864  7.54  -0.935  0.3787

Would you consider adding this little "helper" to your package?

rvlenth commented 1 month ago

How about if you post this on Cross Validated. and I will consider including a link to that posting in the "models" vignette.