Closed mcaceresb closed 7 months ago
You're right, for fweight
, the variance formula is different, because we are pretending that each observation $i$ actually corresponds to $w_i$ identical observations. This makes $\sum_i w_i$ the sample size, and the variance for OLS is as you say, $\sum_i w_i e_i^2 X_i X_i'$ in the middle of the sandwich, because it's as if we observed the error $e_i$ $w_i$ times.
So a simple consistency check for fweight
is to create two datasets, one with several duplicated observations, and one with only unique observations. Applying fweight
to the dataset with unique observations should yield numerically the same standard error as applying no weighting (including in cases like efficient common weights etc).
I don't actually know of any cases where this is super useful, since datasets rarely come in this form, but we can have the option for completeness.
For aweight
, apart from renormalizing the weights to sum to one (which I don't think should actually matter, except perhaps for some finite-sample corrections), the weight $w_i$ gets squared in the formula. In linear regression, it should be equivalent to regressing $\sqrt{w_i}Y_i$ on $\sqrt{w_i}X_i$. This is the more sensible default in most cases. For common weights, there is a question of what the estimation target should be. One guiding principle could be that what we're trying to do is to target a treatment effect that weights states equally, even though unweighted regression would give more weight to states that have bigger population --- but I think that this is what pweight
is supposed to do?
For
aweight
, apart from renormalizing the weights to sum to one (which I don't think should actually matter, except perhaps for some finite-sample corrections),
@kolesarm This is probably true in theory, but it's not obvious to me how to implement weights throughout the code so that re-scaling never matters. For instance, the weighted average of $x_i$ with $w_i$ re-scaled is the same as the average of $x_i \cdot w_i$, but not so if it's not re-scaled. Hence while there are many places where multiplying $w_i$ by a constant would cancel out, like in the sandwhich formula, this is not always the case, and I'm not sure what the correct weighted formulas are in those other cases.
(By the by, this is actually why I haven't added pweight
; the SEs for ATE without re-scaling are wrong.)
Of course, you may need to properly define what you mean by the estimand if you don't re-scale the weights: for example, when weighting by inverse variance, it helps with interpretation if we don't do any rescaling, but need to take that into account when defining object of interest. Usually people would define the weighted average as $\sum_i w_i x_i / \sum_i w_i$, since that's the minimized of the objective $\sum_i w_i(x_i-b)^2$, and similarly for regression. This is how it's done in R, I think.
For ATE, you need to define the object of interest as a weighted ATE, etc.
Continued in #14
I've implemented
aweight
(default) andfweight
for multe.aweight
s are rescaled so the sum adds up to the number of observations;fweight
s consider the weights to indicate that many copies of each observation.Help with Formulas
@peterdhull @kolesarm I've implemented some consistency checks. I'm confident in the
fweight
implementation as well asaweight
for ATE and OAAT, since I was able to check them againstteffects
andregress
. However, the common weights and the decomposition I am less sure about. If you have any time, could you either review the snippets I link below or LMK point me to the right weighted formulas?w
is a vector of weights or of 1s (unweighted).nobsw
has either the sum of the weights (fweight
) or number of observations in memory. You can see the only difference is in the first case I do the weighted variance and in the second the variance ofpsi * weight
.Below I show the consistency checks I've coded and I also have a note on why
fweight
s andaweight
s compute different variances, but feel free to skip.Consistency checks
You can see in test/test_weights.do I've implemented some consistency checks for weights using the Star data. Up to a df adjustment, the ATE and OAAT SEs match
regress
andtefffects
forfweight
andaweight
(teffects
does not allowaweight
; I did the rescaling manually and got the same answer). Here's a sample snippet:A note on the variance formulas
Because of the rescaling, things like averages, etc. are the same. However, the interpretation leads to different formulas for the vcov beyond just a sample-size adjustment. For analytic weights (default), for example, the OLS variance is
$$ V((X' W X)^{-1} X' W e) = (X' W X)^{-1} X' W V(e) W X (X' W X)^{-1} $$
For frequency weights, it's
$$ (X' W X)^{-1} X' W V(e) X (X' W X)^{-1} $$
The reason is that, unweighted,
$$ X' V(e) X = \sum_i e_i^2 x_i' x_i $$
With frequency weights, the exact same sum is given by
$$ \sum_j w_j e_j^2 x_j' x_j $$
with $w_j$ the size of each group. However, with analytic weights we have $w_j^2$ instead. Here's a snippet where you can see this explicitly: