mhamjediers / nopo_decomposition

Stata ado files to run matching-based decompositions
MIT License
3 stars 0 forks source link

Correct Standard Error Estimation #14

Open maximilian-sprengholz opened 1 year ago

maximilian-sprengholz commented 1 year ago

Implement Influence Functions to estimate correct standard errors for all decomposition components.

maximilian-sprengholz commented 8 months ago

Atm, the suest in the background does not produce SEs and issues a "Warning: Variance matrix is nonsymmetric or highly singular" if there is no varation in $Y$ in one of the compared groups per component (usually the unmatched for $D_A$ and $D_B$). Probably rare issue, but more likely with binary outcomes.

mhamjediers commented 4 months ago

Okay, what I achieved so far:

mhamjediers commented 4 months ago

What I learned today...

Implementing this for the example data from the help-file provides a first indication that the SEs for all four components work well :)

mhamjediers commented 4 months ago

remaining thoughts:

mhamjediers commented 4 months ago

Okay, I implemented it also now for md and ps. the SEs seem plausible, but they do not match to the bootstrapping one's and sometimes we get still negative values (especially, if too many variables); let's see what the simulations bring...

maximilian-sprengholz commented 4 months ago

Notes on kmatch and nlcom implementation:

mhamjediers commented 4 months ago

I would say that we should try to incorporate his influence functions in the document directly, as currently the kmatch-command also does not acknowledge the uncertainty with respect to the weights.

Then the last difference to nopo is the incorporation of the covariance, which we even might also achieve with the influence functions... and moreover, we should also get influence functions for DA and DB based on his formula and what we learned now.

So overall, we could do everything with influence functions by ourselves and only provide these SE, no?

maximilian-sprengholz commented 4 months ago

Currently agreed on setup:

What we should not forget:

mhamjediers commented 4 months ago

Some remaining minor things:

maximilian-sprengholz commented 4 months ago

Note on total varlist, vce(): Strange default behavior, which is not vce(analytic), but appears to be vce(robust) despite not being documented as possible option. For now, kmatchse defaults to vce() and not vce(analytic), even though the kmatch default would be vce(analytic) (there is no way around this, as we do not have any influence on the returned macro from kmatch). When streamlining, we should check if these defaults can be consistent across SEs.

maximilian-sprengholz commented 4 months ago

Standard errors are implemented for md and ps using the influence functions returned by kmatch and self-generated ones for DA and DB. All are scaled that total is the appropriate way to obtain standard errors. Nice side-effect is that weights and clustering are already inbuilt.

Important thing I noticed because we use both matching directions for kmatchse: If units are matched in only one of the directions, we count them as matched for all directions, although they might actually be unmatched in the direction of interest (ATT/ATC). This was no problem when ATT/ATC was fixed, but now leads to wrong subsamples, affecting DA/DB (and therefore DX as residual). This problem would also affect any postestimation usage after calling kmatch, att atc (without setting bwidth) in the currently published version, since we did not ensure that sharedbwidth was set as kmatch option. Is now part of the checks and will be added automatically.

maximilian-sprengholz commented 3 months ago

After consulting help kmatch again, Jann (2019) and some testing, the IFs do not seem to incorporate the uncertainty from being estimated. So, the current SE implementation for md and ps has to be adjusted, possibly following Abadie and Imbens (2006) or analogous to Jann (2021) on ebalfit.

GiuliaS2024 commented 6 days ago

Dear Maximilian and Maik, My output table contains one column only, for coefficients, and does not display the other columns for : Std. err., z , P>|z|, [95% conf. interval]. Is this normal? I only manage to get the complete table, including Std.err. etc, if I use the bootstrap, but with this command weights are not allowed. Do you know where the issue may come from? Many thanks for your feedback! and for the package! :)