sinanpl / OaxacaBlinder

R implementation of Oaxaca-Blinder gap decomposition
MIT License
1 stars 1 forks source link

Fix calculations when terms are dropped #20

Closed davidskalinder closed 4 months ago

davidskalinder commented 5 months ago

Okay I think this should fix the bulk of #17. I tried to build this TDD-style, first writing tests that failed and then adapting the calculations until they passed. A couple of the tests just ensure that the baseline results (with no dropped coefficients) were correct, so they may be redundant with some of what's in #7. Note though that I don't really understand how to control how oaxaca::oaxaca() handles the pooled model in twofold decompositions, so there's no test to compare our results to that function's results (though there is a comparison to some naive manual calculations).

As mentioned in #17, outstanding issues with this fix are overall results and bootstraps: when any category of a predictor is missing from one group's data, its term estimate is NA, so the overall results (which try to sum the NA) are NA, and if bootstraps are requested quantile() throws an error when it gets to the NA.

However, these bugs are not introduced by this PR. (Prior to this PR, the same overall and bootstrap behavior could be produced by running a model with dummy variables that are dropped, even though such models' calculations were correct.) So @sinanpl I think we should accept this PR (assuming you don't see anything wrong with it heh) and that I should open new issue threads to deal with each of these problems separately; but of course if you'd prefer to wait until everything's working properly for dropped terms then you should hold off on accepting this one.

Hope all that's clear, of course let me know if not!