ngreifer / cobalt

Covariate Balance Tables and Plots - An R package for assessing covariate balance
https://ngreifer.github.io/cobalt/
73 stars 11 forks source link

Access mean SMD value on longitudinal treatments #61

Closed maellecoursonnais closed 2 years ago

maellecoursonnais commented 2 years ago

Hello,

As far as I know, there is no easy way to access the mean covariate balance across times, only the max is available. I'm guessing that could be an easy thing to add to the function?

Minimal example:

library(cobalt)
data("iptwExWide", package = "twang")
library(WeightIt)
Wmsm <- weightitMSM(list(tx1 ~ use0 + gender + age,
                         tx2 ~ use0 + gender + age + use1 + tx1,
                         tx3 ~ use0 + gender + age + use1 + tx1 + use2 + tx2),
                    data = iptwExWide,
                    method = "ps")
baltab <- bal.tab(Wmsm, un = T)

baltab$Balance.Across.Times
             Times     Type Max.Diff.Un Max.Diff.Adj
prop.score 1, 2, 3 Distance   0.7862446  0.025135867
use0       1, 2, 3  Contin.   0.2667626  0.055835400
gender     1, 2, 3   Binary   0.2944634  0.026293838
age        1, 2, 3  Contin.   0.3798713  0.070253208
use1          2, 3  Contin.   0.1662348  0.031572818
tx1           2, 3   Binary   0.1694514  0.017114709
use2             3  Contin.   0.1086601  0.031463385
tx2              3   Binary   0.2422819  0.008532322

So we get the Max here, but not the mean value. Are you aware of a way to compute those values? It seems it is possible to plot them with love.plot but not to get them directly from bal.tab.

Cheers!

ngreifer commented 2 years ago

Actually, this is not possible and I programmed it this way for a reason (i.e., it is technically possible but I don't think it should be done so I don't allow it). The average SMD is useful when the effect estimate is the average of several effect estimates and offsetting biases cancel out, for example, with multiply imputed data or clustered data where the clustering is incidental. Within longitudinal treatments, this is not the case. Balance must be achieved at each time point, not in aggregate, so the average SMD across time points is not a valid measure of balance (i.e., because it will miss imbalance in some time periods, which do not cancel out with offsetting imbalances in other time periods). I recommend you stick with the maximum balance statistic provided by default or look at balance at individual time points.

If you want to compute average SMDs across time points, I cannot condone this, so you will have to do it manually. Some code to do that might be the following:

colMeans(dplyr::bind_rows(lapply(baltab$Time.Balance, \(x) {
    setNames(x$Balance$Diff.Adj, rownames(x$Balance))
})), na.rm = TRUE)