Correct Standard Error Estimation

maximilian-sprengholz commented 1 year ago

Implement Influence Functions to estimate correct standard errors for all decomposition components.

maximilian-sprengholz commented 8 months ago

Atm, the suest in the background does not produce SEs and issues a "Warning: Variance matrix is nonsymmetric or highly singular" if there is no varation in $Y$ in one of the compared groups per component (usually the unmatched for $D_A$ and $D_B$). Probably rare issue, but more likely with binary outcomes.

mhamjediers commented 4 months ago

Okay, what I achieved so far:

[x] SE for DA/DB are implemented in nopo and checked; it is still kind of weird that the mean-difference and share are based on the same sample and thereby could be interdependent; but so far my simulations do not indicate that this is a problem
[x] SE for D0 is not yet implemented; I found a faster way to compile the SE for D0 (and also understood the formula better - guess what, it's based on the SE for a product ;) )
[x] We have to check again on how to do it for ps and md
[ ] SE for DX is not analog to SE for D0 because the means for the difference are based on the same sample (A and A weighted); we have to account for the covariance between them, and if I understood the formula of SE_D0 correctly, it approximates the covariance of Y and w for the counterfactuals by assessing the covariance across strata; maybe this is also a solution for DX
[x] In the end we still lack the potential covariance of all components across each other... Right now we only approximate the diagonal of V ...

mhamjediers commented 4 months ago

What I learned today...

Nopo's formula is despite looking differently the same as the matrix notation (but 2*trace).
the alpha standardizes the variance of the w_f from the sample size of women to that of men
Jann (2008) argues that the trace vanishes because X and b are of order n-1 or n-2 (footnote 5); this is not the case for Nopo, as the vanishing depends on the number of individuals in each strata and the strata may rise with higher n
Nopo (2008) highlights that by using the strata, we actually have an empirical distribution of the estimates which allows deriving the Standard Errors from it (so our intuition was completely right, to approximate the covariance across strata; but this is also funny, because you want a lot of strata to have the best approximation, but at the same time, you want fewer strata to have sufficient obs in each to have the correct estimation within them; and this trade-off let's the trace-part not vanish)
The actual problem in the estimation (and the negative variance) is rather caused by underestimating the variance of the male wages, as this is not estimable in strata with n = 1 and thereby neglected; so while the covariance works more or less, it is rather the estimation of the SE of male wages within each strata that drives the result
My idea would be, to plug in the overall variance of matched men to approximate the variance for strata with n=1 (already implemented) and additionally show a notification for the user
This might even open up the possibility of doing it for md and psagain

Implementing this for the example data from the help-file provides a first indication that the SEs for all four components work well :)

mhamjediers commented 4 months ago

remaining thoughts:

The approach of plugging in the overall var does not acknowledge cases of hetereoskedasticity (e.g., greater variance for higher values); mention this in the Note and maybe look again at the Huber-Sandwich-Estimator
Acknowledge somewhere that DA/DB assume no covariance between the share of matched and the extent of the gap
I'm still unsure why DX would work (but let's see), because the assumption of independent samples of eq. (9) of Nopo (2008) is theoretically violated

mhamjediers commented 4 months ago

Okay, I implemented it also now for md and ps. the SEs seem plausible, but they do not match to the bootstrapping one's and sometimes we get still negative values (especially, if too many variables); let's see what the simulations bring...

[x] Which is the best plug-in? Var of matched, mean of var for large enough strata, (y-meanofmatched)^2?
[ ] Check again, whether everything is correctly specified when doing atc/att etc; which values for _yB and _wA?
[x] If components are omitted, the variance is zero, which then causes the Bootstrap to think that it is omitted due to collinearity and to drop the estimates from respective bootstrap-samples. Maybe add a check, that SEs are only estimated when no bootstrapping-prefix in applied

maximilian-sprengholz commented 4 months ago

Notes on kmatch and nlcom implementation:

We should discuss if/how we proceed given Ben Jann's comment
If yes, we'd need a proper implementation of having all the treatment effects needed for direct computation via nlcom. At the moment, we deliberately fixed to either att or atc. If I remember correctly, passing through all estimates would require some direct referencing (e.g. of the estimates and potential outcomes) instead of using just the single estimate, the single weight, and so on.

mhamjediers commented 4 months ago

I would say that we should try to incorporate his influence functions in the document directly, as currently the kmatch-command also does not acknowledge the uncertainty with respect to the weights.

Then the last difference to nopo is the incorporation of the covariance, which we even might also achieve with the influence functions... and moreover, we should also get influence functions for DA and DB based on his formula and what we learned now.

So overall, we could do everything with influence functions by ourselves and only provide these SE, no?

maximilian-sprengholz commented 4 months ago

Currently agreed on setup:

For em, we offer two SEs:
- Nopo SEs (with Variance plug-in correction for understaffed strata)
- Influence Function SEs
For md and ps we calculate the SEs from the influence functions returned by kmatch
Bootstrapping is still available and appropriate for all approaches

What we should not forget:

Defaults
Streamlined vce implementation
- If I understand Ben Jann correctly, IFs are robust by default and clustering should be possible by using corresponding estimators for mean/total
- Bootstrapping with clusters is inbuilt
- For Nopo we'd account for that in every stratum?

mhamjediers commented 4 months ago

Some remaining minor things:

[ ] We could also add an option of a minimum number of obs per strata and group to be considered in the Nopo and Influence Function SEs.
[ ] housekeeping for Nopo SE to only apply to em
[ ] adjusting the Nopo SE for DA and DB to exclude the uncertainty about the matching status (this would be coherent with treating the limitation to common support as exogenous)
[ ] implement weights for Influence Function SE
[x] for md and ps we should generate the influence-function-variables via kmatch and add also the SEs for DA and DB via influence functions (to obtain a cohort variance-covariance-matrix of the components)

maximilian-sprengholz commented 4 months ago

Note on total varlist, vce(): Strange default behavior, which is not vce(analytic), but appears to be vce(robust) despite not being documented as possible option. For now, kmatchse defaults to vce() and not vce(analytic), even though the kmatch default would be vce(analytic) (there is no way around this, as we do not have any influence on the returned macro from kmatch). When streamlining, we should check if these defaults can be consistent across SEs.

maximilian-sprengholz commented 4 months ago

Standard errors are implemented for md and ps using the influence functions returned by kmatch and self-generated ones for DA and DB. All are scaled that total is the appropriate way to obtain standard errors. Nice side-effect is that weights and clustering are already inbuilt.

Important thing I noticed because we use both matching directions for kmatchse: If units are matched in only one of the directions, we count them as matched for all directions, although they might actually be unmatched in the direction of interest (ATT/ATC). This was no problem when ATT/ATC was fixed, but now leads to wrong subsamples, affecting DA/DB (and therefore DX as residual). This problem would also affect any postestimation usage after calling kmatch, att atc (without setting bwidth) in the currently published version, since we did not ensure that sharedbwidth was set as kmatch option. Is now part of the checks and will be added automatically.

maximilian-sprengholz commented 3 months ago

After consulting help kmatch again, Jann (2019) and some testing, the IFs do not seem to incorporate the uncertainty from being estimated. So, the current SE implementation for md and ps has to be adjusted, possibly following Abadie and Imbens (2006) or analogous to Jann (2021) on ebalfit.

GiuliaS2024 commented 6 days ago

Dear Maximilian and Maik, My output table contains one column only, for coefficients, and does not display the other columns for : Std. err., z , P>|z|, [95% conf. interval]. Is this normal? I only manage to get the complete table, including Std.err. etc, if I use the bootstrap, but with this command weights are not allowed. Do you know where the issue may come from? Many thanks for your feedback! and for the package! :)

mhamjediers / nopo_decomposition

Correct Standard Error Estimation #14