rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
364 stars 32 forks source link

qdrg with MICE (as.mira) - is there any way to marry them? #446

Closed Generalized closed 1 year ago

Generalized commented 1 year ago

I need to run ordinal logistic regression for repeated observations via GEE. I can use either ordLORgee from the multgee package or repolr from the repolr package. Unfortunately, neither supports emmeans. They can be integrated with emmeans via qdrg, though. This works very well.

Now, I have multiply imputed dataset. I know that emmeans works with via as.mira() - I use it with mmrm and gee - but this seems to work with models that emmeans can recognize.

But how to deal with qdrg this way? I'm afraid it's impossible, because how to pass all those coefficients and variances?

For example (from another question):

list_ords <- lapply(imp_data_long, 
                       function(data)
                         ordLORgee(formula = ODIPain ~ Visit_nOrd * Arm + Age_centered + Age_centered : Visit_nOrd,
                                             data = data, id = PatientId, repeated = as.numeric(Visit_nOrd), LORstr = "RC"))

but what to do with qdrg?

qdrg(formula = ODIPain ~ Visit_nOrd * Arm + Age_centered + Age_centered : Visit_nOrd,....?

I thought, that maybe I should create qdrg in each iterated analysis:


ord_gee_imp <- lapply(imp_data_long, 
                       function(ddd) {
                         fitmod <- ordLORgee(formula = ODIPain ~ Visit_nOrd * Arm + Age_centered + Age_centered : Visit_nOrd,
                                             data = ddd, id = PatientId, repeated = as.numeric(Visit_nOrd), LORstr = "uniform")

                          lord_grid <- qdrg(formula = ODIPain ~ Visit_nOrd * Arm + Age_centered + Age_centered : Visit_nOrd,
                                             data=ddd, 
                                             coef = fitmod$coefficients, 
                                             vcov = fitmod$robust.variance, 
                                             df = Inf, 
                                             ordinal.dim =length(levels(data$ODIPain)), 
                                             link = fitmod$link)
                          })

and somehow pass it to emmeans, but it also doesn't work:

> emmeans(as.mira(ord_gee_imp), specs = ~Arm * Visit_nOrd)
Error in UseMethod("recover_data") : 
  no applicable method for 'recover_data' applied to an object of class "emmGrid"
Error in (function (object, at, cov.reduce = mean, cov.keep = get_emm_option("cov.keep"),  : 
  Perhaps a 'data' or 'params' argument is needed

I desperately try to complete the analysis... I'm about to write the components necessary to implement the emmeans through the vignette, but I thought there is some workaround that can save me this time?

Generalized commented 1 year ago

OK, I think I found a quick-and-(very)dirty workaround.

I looked at the source in this file: https://github.com/rvlenth/emmeans/blob/3957a59d33d35963afebacdb7856ce84ee5be71c/R/multiple-models.R#L139

namely to this function:

emm_basis.mira = function(object, trms, xlev, grid, ...) {
    # In case our method did a "pass it on" with the data, we need to add that attribute
    data = list(...)$misc$data
    if(!is.null(data))
        object$analyses = lapply(object$analyses, function(a) {attr(a, "data") = data; a})
    bas = emm_basis(object$analyses[[1]], trms, xlev, grid, ...)
    k = length(object$analyses)
    # we just average the V and bhat elements...
    V = 1/k * bas$V
    allb = cbind(bas$bhat, matrix(0, nrow = length(bas$bhat), ncol = k - 1))
    for (i in 1 + seq_len(k - 1)) {
        basi = emm_basis(object$analyses[[i]], trms, xlev, grid, ...)
        V = V + 1/k * basi$V
        allb[, i] = basi$bhat
    }
    bas$bhat = apply(allb, 1, mean)  # averaged coef
    notna = which(!is.na(bas$bhat))
    bas$V = V + (k + 1)/k * cov(t(allb[notna, , drop = FALSE]))   # pooled via Rubin's rules
    bas
}

This creates the emm.basis, which is kinda "input" for the emmeans. It prepares the necessary components, like pooled coefficients and (co)variances, passed later to emmeans.

And that's exactly what I need to pass to qdrg: coefficients, var_cov and df.

So I took your code to pool the estimates:

pool_estimates_for_qdrg <- function(k, coefficients, varcov) {

  V <- 1/k * varcov[[1]]
  allb <- cbind(coefficients[[1]], matrix(0, nrow = length(coefficients[[1]]), ncol = k - 1))

  for (i in 1 + seq_len(k - 1)) {
    V <- V + 1/k * varcov[[i]]
    allb[, i] <- coefficients[[i]]
  }

  pooled_coef <- apply(allb, 1, mean)
  notna <- which(!is.na(pooled_coef))
  pooled_VC <- V + (k + 1)/k * cov(t(allb[notna, , drop = FALSE])) 

  return(list(pooled_coef = pooled_coef, pooled_VC = pooled_VC))
}

ord_gee_imp <- lapply(imp_data_long, 
                      function(dat) {
                        ordLORgee(formula = ODIPain ~ Visit_nOrd * Arm + Age_centered + Age_centered : Visit_nOrd,
                                  data = dat, 
                                  id = PatientId, 
                                  repeated = as.numeric(Visit_nOrd), LORstr = "uniform")
                      })

ord_gee_imp_grid <- with(pool_estimates_for_qdrg(k=20, 
                             coefficients = lapply(ord_gee_imp, function(analysis) analysis$coefficients),
                             varcov =       lapply(ord_gee_imp, function(analysis) analysis$robust.variance)),
                         qdrg(formula = ODIPain ~ Visit_nOrd * Arm + Age_centered + Age_centered : Visit_nOrd,
                              data=imp_data_long[[1]], # fake data in any case 
                              coef = pooled_coef, 
                              vcov = pooled_VC, 
                              df = Inf, 
                              ordinal.dim = 6, 
                              link = "Cumulative logit"))

(tidy(ord_gee_imp_em <- emmeans(ord_gee_imp_grid, specs = ~Arm * Visit_nOrd)))
# A tibble: 6 × 7
  Arm   Visit_nOrd estimate std.error    df statistic   p.value
  <chr> <chr>         <dbl>     <dbl> <dbl>     <dbl>     <dbl>
1 SoC   Month 6      -1.05      0.262   Inf     -4.01 0.0000618
2 HD    Month 6      -1.11      0.545   Inf     -2.04 0.0411   
3 SoC   Month 12     -1.40      0.462   Inf     -3.03 0.00248  
4 HD    Month 12     -0.954     0.553   Inf     -1.73 0.0843   
5 SoC   Month 20     -1.54      0.473   Inf     -3.25 0.00114  
6 HD    Month 20     -0.707     0.550   Inf     -1.29 0.198    

update(contrast(ord_gee_imp_em ,
                list("Month 6  : SoC vs. HD" = c(-1, 1, 0, 0, 0, 0),
                     "Month 12 : SoC vs. HD" = c( 0, 0,-1, 1, 0, 0),
                     "Month 20 : SoC vs. HD" = c( 0, 0, 0, 0,-1, 1))),
       adjust="none", level = 0.95, infer = c(TRUE, TRUE))

 contrast              estimate    SE  df asymp.LCL asymp.UCL z.ratio p.value
 Month 6  : SoC vs. HD  -0.0642 0.332 Inf    -0.715     0.587  -0.193  0.8468
 Month 12 : SoC vs. HD   0.4436 0.376 Inf    -0.293     1.180   1.181  0.2376
 Month 20 : SoC vs. HD   0.8301 0.355 Inf     0.133     1.527   2.335  0.0195

Note: contrasts are still on the Cumulative logit scale 
Confidence level used: 0.95 

I plotted values of selected statistics for each imputed dataset and the pooled ones. The consistency is perfect, so it worked well.

obraz

So now I have a universal way of dealing with unknown models in the framework of multiple imputation. At least as long, as all necessary components will be provided...

Would you consider adding kind of a "helper function" that facilitates these steps in case someone is using the qdrg()? It could help a lot of people! (I guess...)

rvlenth commented 1 year ago

Thanks. I don't want to export that function as it has such a narrow purpose. However, I am providing a link to this issue in Section I of the models vignette, so that people with this situation can find it.

rvlenth commented 1 year ago

qdrg() is designed as something for pretty general use, I think an option like that is too specific to a very narrow class of models, so I'm not interested in adding that option at this time. And it would require essentially repeating the same code, because it can't just call emm_basis.mira without some fancy workarounds.

By the way, link = "Cumulative logit" is not supported. You should specify link = "logit".

Generalized commented 1 year ago

Thank you. Yes, you are right. Anyway, it's great the qdrg() facilitates "bridging" emmeans to models in so many situations. It was a great idea to make it.