sinanpl / OaxacaBlinder

R implementation of Oaxaca-Blinder gap decomposition
MIT License
1 stars 1 forks source link

calculate_gaps does not pick up group_a / b correctly #22

Open sinanpl opened 5 months ago

sinanpl commented 5 months ago

@davidskalinder a recent merge introduced following bug:

coefs / overall are ok, but in the generics.R the summary function cals calculate_gaps. The data_a and data_b arguments are causing for reversed averages such as below (male mean wage is actually the female mean wage)

library(OaxacaBlinder)
formula = real_wage ~ age | gender
data = OaxacaBlinder::chicago_long

# oacaxa works with binary / logical
datasetv2 = data
datasetv2$gender = ifelse(datasetv2$gender == levels(data$gender)[1], 1, 0)

modv1_twofold_neumark = OaxacaBlinder::OaxacaBlinderDecomp(
    formula=formula,
    data = data,
    type = 'twofold',
    pooled = 'neumark',
    baseline_invariant = FALSE,
    n_bootstraps = NULL
)
modv2 = oaxaca::oaxaca(formula = formula, data = datasetv2, R=NULL)
#> oaxaca: oaxaca() performing analysis. Please wait.

summary(modv1_twofold_neumark)
#> Oaxaca Blinder Decomposition model
#> ----------------------------------
#> Type: twofold
#> Formula: real_wage ~ age | gender
#> Data: data
#> 
#> Descriptives
#>                  n    %n mean(real_wage)
#> gender==male   412 57.9%           13.69
#> gender==female 300 42.1%           17.52
#> 
#> Gap: -3.83
#> % Diff: -28.01%
#>               coefficient   % of gap
#> explained           -0.20       5.3%
#> unexplained         -3.63      94.7%
#> unexplained_a       -2.06      53.7%
#> unexplained_b       -1.57      40.9%
data |> 
    dplyr::group_by(gender) |> 
    dplyr::summarise(mean(real_wage, na.rm=TRUE))
#> # A tibble: 2 × 2
#>   gender `mean(real_wage, na.rm = TRUE)`
#>   <fct>                            <dbl>
#> 1 male                              17.5
#> 2 female                            13.7
t(t(modv2$twofold$overall[c(5), ]))
#>                          [,1]
#> group.weight        -1.000000
#> coef(explained)     -0.204005
#> se(explained)              NA
#> coef(unexplained)   -3.630331
#> se(unexplained)            NA
#> coef(unexplained A) -2.060458
#> se(unexplained A)          NA
#> coef(unexplained B) -1.569873
#> se(unexplained B)          NA

Created on 2024-03-29 with reprex v2.0.2

davidskalinder commented 5 months ago

in the generics.R the summary function cals calculate_gaps

Hmm, that's odd. Do you know which line is going wrong? It looks to me like calculate_gaps() is only called by OaxacaBlinderDecomp(), and I can't see anything in those two functions that would be flipping the groups?