s3alfisc / fwildclusterboot

Fast Wild Cluster Bootstrap Inference for Regression Models / OLS in R. Additionally, R port to WildBootTests.jl via the JuliaConnectoR.
https://s3alfisc.github.io/fwildclusterboot/
GNU General Public License v3.0
23 stars 4 forks source link

Fixed Effect Support #102

Closed jcha1997 closed 1 year ago

jcha1997 commented 1 year ago

Hello again, This isn't as much of an issue, but a quick clarification question. In the vignette you mention "Last, boottest() supports out-projection of fixed effects in the estimation stage via lfe::felm() and fixest::feols(). Within the bootstrap, the user can choose to project out only one fixed effect, which can be set via the fe function argument. All other fixed effects specified in either felm() or feols() are treated as sets of binary regressors."

I was wondering if I could get some clarification of what is happening to additional fixed effects. In a two-way fixed effects model, for instance, if I set the fe to year, but the feols object has a unit fixed effect, what happens to that fixed effect?

s3alfisc commented 1 year ago

Hi Jeremiah =)

The short version is: the bootstrap algo does not accommodate multiple fixed effects. Nevertheless you can throw in any kind of fixest model and boottest() will appropriately transform the model so that the algorithm API can process it.

Let's assume that you estimate the following model:

feols(Y ~ X |b + c)

What happens in fixest is that prior to running inv(X'X)X'Y, Y and X are demeaned by b and c.

When running the wild cluster bootstrap with fixed effects, the bootstrap regressions need to be demeaned for each of the B bootstrap iterations. The algorithm in fwildclusterboot as spelled out in the "fast and wild" paper does not support demeaning with more than one variable. In consequence, boottest (the parent Stata package) throws an error, and in consequence, requires that users re-specify and re-run their reghdfe models to have only one fixed effect.

fwildclusterboot instead overwrites the formula specification feols(Y ~ X |b + c) to

feols(Y ~ X  + factor(b) + factor(c))

when you specify fe = NULL (the default), in which case it does not demean internally, or to

feols(Y ~ X  + factor(b) | c)

when e.g. specifying fe = "c", in which case each internal bootstrap regression is demeaned.

In other words, this is supposed to be a convenience feature for users. But I have indeed been wondering if the benefit to users outweights the added complexity to boottest()'s internal pre-processing.

Here is the respective preprocessing code for fixed effects: link.

# object here = the fixest regression object; model_matrix is an internal function with minor differences to the # base model.matrix
X <- model_matrix(object, type = "rhs", collin.rm = TRUE)
all_fe <- model_matrix(object, type = "fixef", collin.rm = TRUE)
if(is.null(fe)){
      add_fe <- all_fe
      add_fe_names <- names(add_fe)
      # create a formula based on the fixed effects
      fml_fe <- reformulate(add_fe_names, response = NULL)
      # create matrix of fixed effect dummies
      add_fe_dummies <-
        model.matrix(fml_fe, model.frame(fml_fe, data = as.data.frame(add_fe)))
      # update the design matrix X (i.e. add fixed effects)
      X <-
        as.matrix(collapse::add_vars(as.data.frame(X), add_fe_dummies))
}

I hope this clarifies things (?), and thanks for the feedback! I'll try to think of ways to clarify the statements in the vignette.