ngreifer / cobalt

Covariate Balance Tables and Plots - An R package for assessing covariate balance
https://ngreifer.github.io/cobalt/
73 stars 11 forks source link

bal.tab after weightitMSM shows Max.Corr.Adj values > 1 #58

Closed giacfalk closed 3 years ago

giacfalk commented 3 years ago

After creating weights with weightitMSM, bal.tab yields values of Max.Corr.Adj that are > 1, which does not make any sense.

What could be the cause? (just reporting a subset of covariates in the table)

Many thanks!

psDRYEXTRA<-weightitMSM(formula.list =FormulaListDRYEXTRA,
                        data=FBS,
                        method = "ps",verbose = T)

bal.tab(psDRYEXTRA, r.threshold = .05, disp.ks = TRUE, which.time = .none)
Balance summary across all time points
                                Times    Type Max.Corr.Adj         R.Threshold Max.KS.Adj
pop                        1, 2, 3, 4 Contin.       0.0500     Balanced, <0.05     0.2977
city_tt                    1, 2, 3, 4 Contin.       1.5959 Not Balanced, >0.05     0.5807
capdist                    1, 2, 3, 4 Contin.       1.1841 Not Balanced, >0.05     0.6695
distnearestcountry         1, 2, 3, 4 Contin.       1.7098 Not Balanced, >0.05     0.6100
distownborders             1, 2, 3, 4 Contin.       3.0464 Not Balanced, >0.05     0.2960

Balance tally for treatment correlations
                    count
Balanced, <0.05         6
Not Balanced, >0.05    36

Variable with the greatest treatment correlation
       Variable Max.Corr.Adj         R.Threshold
 distownborders       3.0464 Not Balanced, >0.05

Effective sample sizes
 - Time 1
             Total
Unadjusted 5000.  
Adjusted      6.57
 - Time 2
             Total
Unadjusted 5000.  
Adjusted      6.57
 - Time 3
             Total
Unadjusted 5000.  
Adjusted      6.57
 - Time 4
             Total
Unadjusted 5000.  
Adjusted      6.57
ngreifer commented 3 years ago

This is not a bug and is explained at help("col_w_cov"). A correlation is a covariance divided by a standardization factor. When the covariance and standardization factor are computed in the same sample, the correlation is bounded between -1 and 1. But cobalt computes the covariance in the weighted sample and the standardization factor in the unweighted sample, so correlations are no longer bounded. What happened here is that the standardization factor is much smaller in the unweighted sample than it would be if it were computed in the weighted sample. This is because you have degenerate weighting solution.

I agree that this can seem weird, but it is exactly the same philosophy as used in computed standardized mean differences for binary treatments; the unweighted standardization factor is used there as well. To use the standardization factor computed in the weighted sample, you can set s.d.denom = "weighted", in which case the correlations should be correctly bounded.