s3alfisc / wildboottestjlr

Fast Wild Cluster Bootstrap Inference for OLS/IV in R based on WildBootTests.jl & JuliaConnectoR
Other
0 stars 0 forks source link

Lifecycle:
experimental

R-CMD-check

Note: All features of wildboottestjlr are fully integrated into fwildclusterboot.

wildboottestjlr

wildboottestjlr ports the functionality of the WildBootTests.jl package to R via the JuliaConnectoR package. At the moment, it supports the following features of WildBootTests.jl:

The following model objects are currently supported:

In the future, IV methods for fixest and lfe will be added.

Installation / Getting Started

wildboottestjlr can be installed by running

library(devtools)
install_github("s3alfisc/wildboottestjlr")

You can install Julia by following the steps described here: https://julialang.org/downloads/. WildBootTests.jl can then be installed via Julia’s package management system.

To install WildBootTests.jl and Julia from within R, you can use wildboottestjlr::wildboottestjlr_setup(). Via wildboottestjlr_setup(), you can install Julia and WildBootTests.jl and connect R and Julia. You simply have to follow the instructions printed in the console!

library(wildboottestjlr)
wildboottestr_setup()

Similarly, you can set the number of Julia threads by running

julia_set_ntreads()

and following the instructions.

Example

wildboottestjlr's central function is boottest(). Beyond few minor differences, it largely mirrors the boottest() function from the fwildclusterboot package.

The Wild Bootstrap for OLS

library(wildboottestjlr)
# set a 'global' seed in the Julia session
set_julia_seed(rng = 12313452435)
#> <Julia object of type MersenneTwister>
#> MersenneTwister(12313452435)

data(voters)
library(fixest)
library(lfe)

# estimation via lm(), fixest::feols() or lfe::felm()
lm_fit <- lm(proposition_vote ~ treatment  + log_income, data = voters)
feols_fit <- feols(proposition_vote ~ treatment  + log_income, data = voters)
felm_fit <- felm(proposition_vote ~ treatment  + log_income, data = voters)

boot_lm <- boottest(lm_fit, clustid = "group_id1", B = 999, param = "treatment", rng = 7651427)
boot_feols <- boottest(feols_fit, clustid = "group_id1", B = 999, param = "treatment", rng = 7651427)
boot_felm <- boottest(felm_fit, clustid = "group_id1", B = 999, param = "treatment", rng = 7651427)

# summarize results via summary() method
#summary(boot_lm)

# also possible: use msummary() from modelsummary package
library(modelsummary)
msummary(list(boot_lm, boot_feols, boot_felm), 
        estimate = "{estimate} ({p.value})", 
        statistic = "[{conf.low}, {conf.high}]"
        )  
Model 1 Model 2 Model 3
1\*treatment = 0 0.089 (0.001) 0.089 (0.001) 0.089 (0.001)
\[0.039, 0.142\] \[0.039, 0.142\] \[0.039, 0.142\]
Num.Obs. 300 300 300
R2 0.045 0.045 0.045
R2 Adj. 0.039 0.039 0.039
R2 Within
R2 Pseudo
AIC -1.9 -3.9
BIC 12.9 7.2
Log.Lik. 4.950 4.950
# plot(boot_lm)

The Wild Bootstrap for IV (WRE)

If boottest() is applied based on an object of type ivreg, the WRE bootstrap Davidson & MacKinnon (2010) is run.

library(ivreg)
data("SchoolingReturns", package = "ivreg")
data <- SchoolingReturns

ivreg_fit <- ivreg(log(wage) ~ education + age + ethnicity + smsa + south + parents14 |
                  nearcollege + age  + ethnicity + smsa + south + parents14,
                data = data)

boot_ivreg <- boottest(object = ivreg_fit, B = 999, param = "education", clustid = "fameducation", type = "webb")

summary(boot_ivreg)
#> boottest.ivreg(object = ivreg_fit, clustid = "fameducation", 
#>     param = "education", B = 999, type = "webb")
#>  
#>  Hypothesis: 1*education = 0
#>  Observations: 3010
#>  Bootstr. Iter: 999
#>  Bootstr. Type: webb
#>  Clustering: 1-way
#>  Confidence Sets: 95%
#>  Number of Clusters: 9
#> 
#>              term  estimate statistic    p.value   conf.low conf.high
#> 1 1*education = 0 0.0904587  2.207293 0.01901902 0.01356883 0.2421277

Benchmarks

After compilation, wildboottestjlr is orders of magnitude faster than fwildclusterboot, in particular when the number of clusters N_G and the number of bootstrap iterations B get large.

The benchmarks plot the median value of 3 runs of a linear regression with N = 10.000 and k = 21.