sinanpl / OaxacaBlinder

R implementation of Oaxaca-Blinder gap decomposition
MIT License
1 stars 1 forks source link

Make checksum function ignore `unexplained_a` and `unexplained_b` columns #28

Closed davidskalinder closed 4 months ago

davidskalinder commented 4 months ago

I caught this with the checksum in #25. It's similar to #24/#27, but this seems to be a different problem, since it occurs even when no terms are dropped from any models.

Here's a reprex run with be17ef9b503 (the current tip of this repo's master installed):

library(OaxacaBlinder)

obd2 <- OaxacaBlinderDecomp(
  ln_real_wage ~ education | foreign_born,
  chicago_long,
  type = "twofold",
  pooled = "neumark",
  baseline_invariant = FALSE
)

# These two should both be equal (and 0.143366, per Stata)
obd2$varlevel |> sum()
#> [1] 0.1610159
obd2$gaps$gap
#> [1] 0.1433657

obd2$varlevel
#>                         explained unexplained unexplained_a unexplained_b
#> (Intercept)            0.00000000 -0.18984208 -7.119078e-02   -0.11865130
#> educationcollege      -0.01842093  0.05208069  2.806974e-02    0.02401094
#> educationhigh.school   0.05381991  0.07923055  2.958816e-02    0.04964239
#> educationLTHS          0.25704502  0.03972345  5.094745e-05    0.03967250
#> educationsome.college -0.16672843  0.03645755  2.352609e-02    0.01293147

Created on 2024-04-11 with reprex v2.1.0

For reference, here's the Stata output for what I believe is the same model. Note that the gap is the same, but the estimates differ slightly. (The fact that they're close makes me wonder if the problem is due to how the small number of missing cases are handled?)

(Click to expand long Stata output) ``` . oaxaca ln_real_wage LTHS some_college college high_school, by(foreign_born) relax pooled Blinder-Oaxaca decomposition Number of obs = 666 Model = linear Group 1: foreign_born = 0 N of obs 1 = 287 Group 2: foreign_born = 1 N of obs 2 = 379 explained: (X1 - X2) * b unexplained: X1 * (b1 - b) + X2 * (b - b2) with b from pooled model (including group dummy) ------------------------------------------------------------------------------ | Robust ln_real_wage | Coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- overall | group_1 | 2.696725 .0332965 80.99 0.000 2.631465 2.761985 group_2 | 2.55336 .0239456 106.63 0.000 2.506427 2.600292 difference | .1433657 .0410128 3.50 0.000 .0629822 .2237493 explained | .1227053 .0226882 5.41 0.000 .0782372 .1671734 unexplained | .0206604 .0397235 0.52 0.603 -.0571961 .098517 -------------+---------------------------------------------------------------- explained | LTHS | .2546119 .0418716 6.08 0.000 .172545 .3366788 some_college | -.167042 .0356373 -4.69 0.000 -.2368899 -.0971942 college | -.0183422 .01107 -1.66 0.098 -.0400391 .0033547 high_school | .0534776 .0297665 1.80 0.072 -.0048637 .1118189 -------------+---------------------------------------------------------------- unexplained | LTHS | .0421566 .0638557 0.66 0.509 -.0829982 .1673114 some_college | .0367711 .0538599 0.68 0.495 -.0687924 .1423347 college | .0520019 .0262055 1.98 0.047 .0006401 .1033638 high_school | .0795729 .0784461 1.01 0.310 -.0741787 .2333244 _cons | -.1898421 .2223261 -0.85 0.393 -.6255932 .2459091 ------------------------------------------------------------------------------ ```

@sinanpl I'm not working with twofold decompositions in my current project, so fixing this this isn't an especially high priority for me, but I did want to raise the issue so it doesn't get lost.

Note that if #25 is accepted, it should stop users from using the buggy results (since the main function will error when the checksum isn't met).

davidskalinder commented 4 months ago

Whoops, I'm a dope. Of course the reprex above includes the columns unexplained_a and unexplained_b in the sum, which it shouldn't. So really this problem is a bug in my checksum function in #25. I'll change the title and get a fix into #25.

davidskalinder commented 4 months ago

Okay #25 and #29 should now have been updated with some (slightly awkward) code to fix this. And now all my tests pass! So, closing this one.