`Rchoice` and `marginaleffects`

vincentarelbundock commented 1 year ago

Hi,

Thanks for this great package.

I am the developer of the marginaleffects package. I have added preliminary support for your package in mine. To try, install the development versions from Github:

remotes::install_github("vincentarelbundock/marignaleffects")

In implementing this, I had a couple questions. First, are there were plans to add a predict.Rchoice() method?

Second, I was wondering if you could clarify something about the quantities generated by effect(). Consider a hetprob model:

library(Rchoice)
library(marginaleffects)
options(width = 10000)

dat <- transform(iris, y = Sepal.Length > median(Sepal.Length))
mod <- hetprob(y ~ Petal.Width * Petal.Length | factor(Species), data = dat, link = "logit")

We can obtain similar results with effect(), avg_slopes(), and by numerical computation using finite difference:

# Rchoice
effect(mod) |> summary()
# ------------------------------------------------------
# Marginal effects for the heteroskedastic binary model:
# ------------------------------------------------------
#                                dydx Std. error z value  Pr(> z)    
# Petal.Width               -0.176911   0.193408  -0.915    0.360    
# Petal.Length               0.372718   0.089944   4.144 3.41e-05 ***
# factor(Species)versicolor  0.001487   0.301239   0.005    0.996    
# factor(Species)virginica  -0.022015   0.301580  -0.073    0.942    
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Note: Marginal effects computed as the average for each individual

# marginaleffects
avg_slopes(mod)
# 
#          Term            Contrast  Estimate Std. Error         z   Pr(>|z|)   2.5 % 97.5 %
#  Petal.Length               dY/dX  0.372623    0.08986  4.146618 3.3742e-05  0.1965 0.5487
#   Petal.Width               dY/dX -0.176924    0.19338 -0.914882    0.36025 -0.5560 0.2021
#       Species versicolor - setosa  0.001487    0.30128  0.004937    0.99606 -0.5890 0.5920
#       Species  virginica - setosa -0.022015    0.30133 -0.073061    0.94176 -0.6126 0.5686
# 
# Prediction type:  pr 
# Columns: type, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high

# finite difference: Petal.Length
h <- 1e-4
dat_lo <- transform(dat, Petal.Length = Petal.Length - h / 2)
dat_hi <- transform(dat, Petal.Length = Petal.Length + h / 2)
p_lo <- predict(mod, newdata = dat_lo, type = "pr")
p_hi <- predict(mod, newdata = dat_hi, type = "pr")
mean((p_hi - p_lo) / h)
# [1] 0.3727184

When I try to do the same with ivpml(), I get different results. My question:

How does the finite difference estimate relate to your computation using asf=TRUE or asf=FALSE?

dat <- read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/AER/PSID1976.csv")
dat$nwincome <- with(dat, (fincome - hours * wage)/1000)
dat$lfp <- as.numeric(dat$participation == "yes")
mod <- ivpml(
    lfp ~ education + experience + I(experience^2) + age + youngkids + oldkids + nwincome |
          education + experience + I(experience^2) + age + youngkids + oldkids + heducation,
    data = dat,
    message = FALSE)

# Rchoice
effect(mod, asf = FALSE) |> summary()
# ------------------------------------------------------
# Marginal effects for the IV Probit model:
# ------------------------------------------------------
#                 dydx Std. error z value  Pr(> z)    
# education   0.048777   0.008733   5.585 2.33e-08 ***
# experience  0.021997   0.003723   5.908 3.46e-09 ***
# age        -0.012882   0.003322  -3.878 0.000105 ***
# youngkids  -0.241982   0.036594  -6.613 3.78e-11 ***
# oldkids     0.013695   0.012792   1.071 0.284373    
# nwincome   -0.010564   0.004736  -2.230 0.025724 *  
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Note: Marginal effects computed as the average for each individual
effect(mod, asf = TRUE) |> summary()
# ------------------------------------------------------
# Marginal effects for the IV Probit model:
# ------------------------------------------------------
#                 dydx Std. error z value  Pr(> z)    
# education   0.051057   0.011101   4.599 4.24e-06 ***
# experience  0.023071   0.002952   7.816 5.44e-15 ***
# age        -0.013484   0.002986  -4.516 6.31e-06 ***
# youngkids  -0.253294   0.033077  -7.658 1.89e-14 ***
# oldkids     0.014335   0.013520   1.060   0.2890    
# nwincome   -0.011058   0.005550  -1.992   0.0463 *  
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Note: Marginal effects computed as the average for each individual

# marginaleffects
avg_slopes(mod)
# 
#        Term  Estimate Std. Error       z   Pr(>|z|)    2.5 %     97.5 %
#         age -0.016209   0.002363 -6.8583 6.9673e-12 -0.02084 -0.0115771
#   education  0.045651   0.008381  5.4470 5.1226e-08  0.02922  0.0620776
#  experience  0.025654   0.002272 11.2922 < 2.22e-16  0.02120  0.0301065
#  heducation -0.009439   0.006665 -1.4161   0.156748 -0.02250  0.0036252
#    nwincome -0.003046   0.001492 -2.0415   0.041198 -0.00597 -0.0001217
#     oldkids  0.010846   0.013033  0.8322   0.405300 -0.01470  0.0363895
#   youngkids -0.259920   0.031886 -8.1516 3.5908e-16 -0.32241 -0.1974249
# 
# Prediction type:  pr 
# Columns: type, term, estimate, std.error, statistic, p.value, conf.low, conf.high

# finite difference: nwincome
h <- 1e-4
dat_lo <- transform(dat, nwincome = nwincome - h / 2)
dat_hi <- transform(dat, nwincome = nwincome + h / 2)
p_lo <- predict(mod, newdata = dat_lo, type = "pr")
p_hi <- predict(mod, newdata = dat_hi, type = "pr")
mean((p_hi - p_lo) / h)
# [1] -0.003046009

Thanks for your time!

mauricio1986 commented 1 year ago

Hi Vincent. Many thanks for your questions. If "asf = TRUE", then the APEs are computed using Equation (19). On the other hand, if "asf = FALSE", then the APEs are computed using Equation (20). Please, see the image below:

Captura de pantalla 2023-02-13 a la(s) 17 42 42

vincentarelbundock commented 1 year ago

Thanks

mauricio1986 / Rchoice

`Rchoice` and `marginaleffects` #1