sinanpl / OaxacaBlinder

R implementation of Oaxaca-Blinder gap decomposition
MIT License
1 stars 1 forks source link

Handle dropped terms when calculating overall estimates #21

Closed davidskalinder closed 4 months ago

davidskalinder commented 5 months ago

When one group is missing levels of a set of dummy variables (or, after #20, of a categorical variable), the estimates for the predictors contain NAs; so when these estimates are summed to produce the overall estimates, the sum also becomes NA. Reprex:

library(OaxacaBlinder)

chicago_mod <- chicago
chicago_mod$too_young <- chicago_mod$age < 19

fmla_tooyoung_dum <-
  ln.real.wage ~
  LTHS + some.college + college + high.school |
  too_young

overall_NA <-
  OaxacaBlinderDecomp(
    fmla_tooyoung_dum,
    chicago_mod,
    type = "threefold"
  )
overall_NA$varlevel
#>              endowments coefficients interaction
#> (Intercept)   0.0000000     1.422265   0.0000000
#> LTHS         -0.1193847    -0.904495   0.5965789
#> some.college         NA           NA          NA
#> college              NA           NA          NA
#> high.school          NA           NA          NA
overall_NA$overall
#> $endowments
#> [1] NA
#> 
#> $coefficients
#> [1] NA
#> 
#> $interaction
#> [1] NA

Created on 2024-03-29 with reprex v2.1.0

As I mentioned at https://github.com/sinanpl/OaxacaBlinder/issues/17#issuecomment-2024278482, I think the solution is simply to put na.rm = TRUE in the overall sum calculations, but I (still) need to fire up Stata to make sure that that's how Jann's package does it. (oaxaca::oaxaca() also returns NA overall estimates when any predictor estimates are missing.)

davidskalinder commented 5 months ago

So based on the Stata output in https://github.com/sinanpl/OaxacaBlinder/issues/24#issue-2216096332, it looks like leaving out the dropped terms when calculating the totals is indeed the right thing to do, but as #24, we don't even seem to have the correct terms to add up yet. So I think this issue depends on #24.

davidskalinder commented 4 months ago

Should be fixed in #27, closing.