ngreifer / WeightIt

WeightIt: an R package for propensity score weighting
https://ngreifer.github.io/WeightIt/
102 stars 12 forks source link

Exported weights when using s.weights #41

Closed statzhero closed 1 year ago

statzhero commented 1 year ago

Can I just check this is intended behavior? I was expecting the weights value to include / apply the sampling weights as those are printed in the summary method. Example to illustrate what I mean:

set.seed(123)

dta <- data.frame(
  t = rnorm(10),
  x = rnorm(10),
  s = runif(10) 
)

dta$s <- dta$s / mean(dta$s)

ebal <- WeightIt::weightit(
  t ~ x , 
  data = dta, 
  method = 'ebal',
  s.weights = "s",
  moments = 1,
  d.moments = 1)

summary(ebal)
# Summary of weights
# 
# - Weight ranges:
#   
#   Min                                 Max
# all 0.0008 |---------------------------| 3.583
# 
# - Units with 5 most extreme weights by group:
#   
#   2      9      4     10     5
# all 0.874 0.9354 1.1649 1.6706 3.583
# 
# - Weight statistics:
#   
#   Coef of Var   MAD Entropy # Zeros
# all       1.027 0.684   0.409       0
# 
# - Effective Sample Sizes:
#   
#   Total
# Unweighted  7.36
# Weighted    5.13

summary(ebal, ignore.s.weights = TRUE)
# Summary of weights
# 
# - Weight ranges:
#   
#   Min                                 Max
# all 0.002 |---------------------------| 8.118
# 
# - Units with 5 most extreme weights by group:
#   
#   7      4      9      1     5
# all 0.7448 1.0909 1.2147 1.5557 8.118
# 
# - Weight statistics:
#   
#   Coef of Var   MAD Entropy # Zeros
# all       1.632 0.923   0.756       0
# 
# - Effective Sample Sizes:
#   
#   Total
# Unweighted 10.  
# Weighted    2.94

ebal$weights |> summary()
# Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
# 0.002018 0.414460 0.736496 1.462753 1.183768 8.118013 

c(ebal$weights * ebal$s.weights) |> summary()
# Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
# 0.000811 0.420640 0.758610 1.000000 1.107548 3.582980 
ngreifer commented 1 year ago

It is intended that all diagnostic analyses (e.g., checking the variability of the weights, ESS, and balance, e.g., when using bal.tab()) should use the combined estimated and sampling weights, which is why summary.weightit() and bal.tab() automatically multiply them together. However, for the sake of transparency, they are separated in the weightit object. If weightit() estimates a propensity score, I think the user should be able to see exactly how the weights are computed from that propensity score and to be able to do it themself manually. This is also to be consistent with twang. To estimate the effect, though, the user needs to multiply the two sets of weights together. This helps to clarify that the estimated weights are working on top of the sampling weights, that is, it is the sampling weighted sample that is being adjusted by the weights.

I agree that this should be clarified in the documentation.

statzhero commented 1 year ago

Thanks! Of course you will know best what is reasonable. In my example I actually don't have sampling weights but they are weights from exact matching on a variable (where the by/exact argument failed with the "initial value in 'vmmin' is not finite" error).

So one could also use the weights in the objective function via base.weights but the results are different. weightit() returns no value for out$base.weights correct?

In this case, would you recommend the base weights?

For an iterative procedure (balance, censor extreme weights, balance, etc) would you recommend base weights?

ebal2 <- WeightIt::weightit(
  t ~ x , 
  data = dta, 
  method = 'ebal',
  base.weights = dta$s,
  moments = 1,
  d.moments = 1)

summary(ebal2)
# Summary of weights
# 
# - Weight ranges:
#   
#   Min                                  Max
# all 0.0006 |---------------------------| 1.6322
# 
# - Units with 5 most extreme weights by group:
#   
#   5      2      4      3     10
# all 1.1977 1.3239 1.5135 1.6041 1.6322
# 
# - Weight statistics:
#   
#   Coef of Var   MAD Entropy # Zeros
# all       0.606 0.485   0.255       0
# 
# - Effective Sample Sizes:
#   
#   Total
# Unweighted 10.  
# Weighted    7.52
ngreifer commented 1 year ago

The point of entropy balancing with base weights is for the entropy balancing to "nudge" the base weights to retain some of their original properties while achieving exact balance. An example would be to fit a flexible machine learning model propensity score and compute IPWs that balance the covariates broadly but maybe not exactly, and then supply those as base weights to entropy balancing to ensure exact balance on a few specific terms (e.g., the means of the covariates). The resulting weights are the final weights and they are meant to balance the original sample, not the base-weighted sample. The resulting weights minimize the negative entropy between them and the base weights. This method was used in one of the winning entries in the 2016 ACIC causal inference competition.

With sampling weights, the final weights, computed as the product of the sampling weights and estimated weights, need to balance the sampling weighted sample. The weighted means of the covariates computed using the final weights are equal to the weighted means of the covariates computed using the sampling weights. See the table below to compare how the two types of weights fit into the algorithm.

s.weights base.weights
target means s.weighted means original means
final weights product of estimated weights and s.weights estimated weights
objective KL divergence of estimated weights from unit weights KL divergence of estimated weights from base.weights

weightit() always returns the estimated weights in weights. When using s.weights, the estimated weights need to be multiplied by the s.weights for use in any computation. summary() and bal.tab() do that automatically.

If you've done a round a matching before entropy balancing and you want the entropy balancing weights to balance the covariates in the matched sample, you should supply the matching weights as s.weights. If you are using the matching weights as a nonparametric estimate of the IPW (e.g., using full matching) and you want new weights that exactly balance the covariate means but diverge as little as possible from the matching weights, supply the matching weights as base.weights.

statzhero commented 1 year ago

Thanks, this is all very clear. I have one final example.

What happens when some s.weight are equal to zero. In a way it shouldn't matter if the observations are there or not, but I don't think this is the case?

set.seed(123)

dta <- data.frame(
  t = rnorm(1000),
  x = rnorm(1000),
  s = c(runif(950), rep(0, 50))
)

dta$s <- dta$s / mean(dta$s) # standarize

dta_sub <- dta[dta$s != 0, ]

ebal <- WeightIt::weightit(
  t ~ x , 
  data = dta, 
  method = 'ebal',
  s.weights = "s",
  moments = 1,
  d.moments = 1)

# Warning message:
# Some weights were estimated as NA, which means a value was impossible to compute (e.g., Inf). Check for extreme values of the treatment or covariates and try removing them. Non-finite weights will be set to 0. 

summary(ebal)
# Summary of weights
# 
# - Weight ranges:
#   
#   Min                                  Max
# all 0.9131 |---------------------------| 1.0605
# 
# - Units with 5 most extreme weights by group:
#   
#   77    359    735    650    834
# all 1.0525 1.0526 1.0554 1.0587 1.0605
# 
# - Weight statistics:
#   
#   Coef of Var MAD Entropy # Zeros
# all        0.23 0.1       0      50
# 
# - Effective Sample Sizes:
#   
#   Total
# Unweighted 716.4 
# Weighted   949.78

ebal2 <- WeightIt::weightit(
  t ~ x , 
  data = dta_sub, 
  method = 'ebal',
  s.weights = "s",
  moments = 1,
  d.moments = 1)

summary(ebal2)
# Summary of weights
# 
# - Weight ranges:
#   
#   Min                                  Max
# all 0.9481 |---------------------------| 1.0319
# 
# - Units with 5 most extreme weights by group:
#   
#   839    310    834    879     53
# all 1.0271 1.0284 1.0285 1.0288 1.0319
# 
# - Weight statistics:
#   
#   Coef of Var  MAD Entropy # Zeros
# all       0.013 0.01       0       0
# 
# - Effective Sample Sizes:
#   
#   Total
# Unweighted 716.4 
# Weighted   949.84
ngreifer commented 1 year ago

When some s.weights are 0, it creates a problem where any estimated weights for those individuals will yield the same balance. That's what the warning is about and is simply a fact of the optimization. I agree that this should be fixed. I'll do that shortly. For other methods where I can set constraints on the weights more explicitly, I have found ways to avoid this problem. I think the solution here is to simply not estimate weights for those with a sampling weight of zero.

ngreifer commented 1 year ago

Okay, thank you for this suggestion. I've made changes so that you should get the same estimates whether you include observations with sampling weights of 0 or not in an analysis. The weights may differ by a scaling factor, but their balancing properties and ESS should be identical. These changes apply to entropy balancing, CBPS, and energy balancing.