ngreifer / cobalt

Covariate Balance Tables and Plots - An R package for assessing covariate balance
https://ngreifer.github.io/cobalt/
73 stars 12 forks source link

Standard Deviation used in SMDs #24

Closed anddis closed 5 years ago

anddis commented 5 years ago

Hello,

Not really a bug/issue with cobalt, rather a question about SMDs I'd be grateful if you could help me with.

Following a 1:1 NNM matching, some of the treated subjects are left unmatched. When computing the SMD, cobalt (with the option s.d.denom = "treated") uses the SD in all treated subjects, ie including those unmatched. This is consistent with MatchIt's behaviour.

In a similar fashion, cobalt with the option s.d.denom = "pooled" computes the denominator of the SMDs using the SD in all untreated subjects (matched and unmatched).

I understand that the denominator of a SMD is –at the end of the day– arbitrary: it's just a value used to standardise the MD (duh!) and we could use –in principle– the SD of any population.

However, I wonder if you have any reference that supports the use of just those SDs as opposed to the SDs in the subjects (treated and untreated) who are successfully matched.

> m <- bal.tab(trt ~ x, 
+         data = s, 
+         method = "weighting",
+         s.d.denom = "pooled",
+         weights=  s$w,
+         continuous = "std")

# 1:1 matching, weights are either 0 or 1
> with(s, table(w))
w
   0    1 
6468 1120 

# 560 untreated subjects matched to 560 treated subjects
> m
Balance Measures
     Type Diff.Adj
x Contin.    0.005

Effective sample sizes
           Control Treated
Unadjusted    6997     591
Adjusted       560     560

> m$Balance
     Type   M.0.Un   SD.0.Un  M.1.Un   SD.1.Un   Diff.Un M.Threshold.Un V.Ratio.Un V.Threshold.Un KS.Un KS.Threshold.Un  M.0.Adj
x Contin. 1.362974 0.5868113 2.14297 0.9667456 0.9753969             NA         NA             NA    NA              NA 2.050798
  SD.0.Adj  M.1.Adj  SD.1.Adj    Diff.Adj M.Threshold V.Ratio.Adj V.Threshold KS.Adj KS.Threshold
x 0.862962 2.054828 0.7274142 0.005040633          NA          NA          NA     NA           NA

> smd_pooled <- setNames((m$Balance["M.1.Adj"] - m$Balance["M.0.Adj"]) / sqrt(.5*m$Balance["SD.1.Un"]^2 + .5*m$Balance["SD.0.Un"]^2), nm = "SMD") 
> smd_pooled
          SMD
x 0.005040633
ngreifer commented 5 years ago

Yes! See Stuart (2008, p2063) and Stuart (2010, p11) for some examples. The important matter is that the denominator should remain the same before and after adjustment so that the change in mean balance is not conflated with a change in variances (the balance of which is assessed separately). This is also what MatchIt and twang use when assessing balance.

Consider the following example: the mean difference of X is 2 before matching, and the standard deviations in both groups are 2, so the unadjusted SMD is 1. Let's say after matching the mean difference is now 1, but the standard deviations in both groups have shrunk to .9. The adjusted SMD using the unadjusted standard deviation is .5, but the SMD using the adjusted standard deviation is 1.11, indicating balance has worsened, when in reality, the bias in the effect estimate has decreased. This is an extreme example in which the conclusions dramatically reverse, but it's possible for the wrong conclusion to be drawn when comparing your new SMD to a set threshold in less egregious cases.

anddis commented 5 years ago

Thanks for your prompt and informative response, Noah!