Closed etchin closed 1 year ago
This is a great question and I had to do a bit of digging to find out. First, it's worth knowing that balance is identical within the groups regardless of which form of weights you use, and this is also the case for stratified effect estimates or an average marginal effect computed when the treat*race
interaction is included in the model (actually, I get very slightly different estimates, which I attribute to numerical imprecision in CBPS()
). So as long as you stratify by race
in all analyses that use the weights, there is no difference.
CBPS()
by default standardizes the weights so that they sum to 1 in each treatment group, which means the weighted count of units in each subgroup of treat*race
will be the same. When you multiply each set of weights by the proportion of units in the corresponding subclass, as you did, you get the nice property that the proportions match their proportions in the original sample. When you don't standardize the weights, the exact balance you observe goes away. When you set standardize = FALSE
in the call to CBPS()
, which is what weightit()
does automatically, you will find the results to be identical to weightit()
's (up to numerical precision). weightit()
does no processing of the propensity scores except turning them into weights using the usual formula; this was done to make the weights as transparent as possible.
Take a look at the reprex below to see how different modifications of the CBPS
weights change their properties. Below that is a demonstration that it doesn't matter as long as all analyses stratify on race
as well.
library(WeightIt)
library(CBPS)
library(cobalt)
data("lalonde", package = "cobalt")
df <- lalonde
form <- treat ~ age + educ + married + nodegree + re74 + re75
df$weights_wis <- weightit(form, data = df, by = "race", method = "cbps", estimand = "ATE")$weights
df$weights_c1 <- NA
split(df$weights_c1, df$race) <- lapply(split(df, df$race), function(d) {
CBPS(form, data = d, ATT = 0)$weights
})
df$weights_c2 <- NA
split(df$weights_c2, df$race) <- lapply(split(df, df$race), function(d) {
CBPS(form, data = d, ATT = 0)$weights*nrow(d)
})
df$weights_c3 <- NA
split(df$weights_c3, df$race) <- lapply(split(df, df$race), function(d) {
CBPS(form, data = d, ATT = 0, standardize = FALSE)$weights
})
df$weights_c4 <- NA
split(df$weights_c4, df$race) <- lapply(split(df, df$race), function(d) {
CBPS(form, data = d, ATT = 0, standardize = FALSE)$weights*nrow(d)
})
bal.plot(treat ~ race, "race", data = df,
weights = c("weights_wis",
"weights_c1", "weights_c2",
"weights_c3", "weights_c4"))
bal.plot(treat ~ married, "married", data = df, subset = df$race == "black",
weights = c("weights_wis",
"weights_c1", "weights_c2",
"weights_c3", "weights_c4"))
Created on 2022-10-12 with reprex v2.0.2
Hi @ngreifer,
This is a great explanation. Thank you for taking the time to answer this question.
So to clarify, these weights are only valid if you stratify across the same variables for the outcomes model as well. In the case you do not stratify, it would require an additional step to transform the weights from weights_wis
to weights_c2
, is that correct?
It might be worth putting a warning message to ensure users stratify the outcomes model or standardize the weights for downstream analyses.
Thanks again!
No problem.
It's not that the weights are invalid unless you stratify, it's that the weights were estimated with stratification in mind and are invariant to transformations if you stratify. The weights_wis
weights are valid as-is; they come directly from estimated propensity scores with no transformations; in that sense they are the most natural weights. The weights_c2
weights are unnatural; they involve a transformation done inside CBPS()
and a transformation by the user to make. Those two transformations impart additional properties because they function like adding an additional set of weights to the existing weights.
Although it would be useful to stratify for downstream analysis (which is always true regardless of how the weights were estimated), this isn't necessary. Again, the weights_wis
weights come directly from propensity scores; those propensity scores happened to be estimated within subgroups, but they function like usual propensity scores, and weights computed from them can be used directly.
Hi @ngreifer,
Thank you for your hard work on developing this package. I'm trying to understand how weights are combined when using the
by =
argument. I'm running thelalonde
dataset, with argumentby = "race"
. As a validity check, I'm comparing with a fully stratified model from CBPS (weighting stratified weights by the number of observations in the strata).When comparing the weights from the
WeightIt
andCBPS
models, they're highly correlated (r^2 = 0.9994001); however, there are slight differences. In particular, if we plot the balance between weights from these two models, we find different results, notably for the covariate we stratify on. How doesWeightIt
transform weights fromCBPS
and combine weights from each strata? Even though we stratify weighting byrace
, why aren't the weights for this variable perfectly balanced?Code used to generate this example is below.
Thanks in advance!