lrberge / fixest

Fixed-effects estimations
https://lrberge.github.io/fixest/
377 stars 59 forks source link

How to set 0 or not applicable values for Cluster, IV, FE parts of the call? #366

Open astraetech opened 1 year ago

astraetech commented 1 year ago

Hi,

First thank you for a fantastic package! Really useful!

This is really a question not an issue, and I apologize if this is an incorrect place to ask these.

I was wondering if feols had 0 or NA placeholder values for clus, IV or FE slots in the formula like in felm? As in the code below:

# Convenience function, fits model use lfe::felm following inputs
reg_felm <- function(lhs, rhs, fe = "0", iv = "0", clus = "0", data_str) {
    data <- eval(parse(text = data_str))
    fmla <- sprintf("%s ~ %s | %s | %s | %s", lhs, rhs, fe, iv, clus)
    fit <- lfe::felm(as.formula(fmla), data = data)
}

found at https://www.patrickbaylis.com/blog/2019-06-11-making-regressions-purrr/ . Calling that formula, without clus, IV or FE gives valid results and in the end when calling bind_rows from the follow-up code on that page, the non-applicable slot of the formula is nicely populated with 0 indicating that that part of the formula was not used. I'm trying to implement the same approach using fixest, but entering 0 in place of FE, clus, or IV seems to give an error. Thank you very much in advance!

grantmcdermott commented 1 year ago

Hi @astraetech.

Two quick points, both of which are documented in the introductory vignette and in the main help pages themselves.

  1. fixest doesn't use/require 0 as a placeholder for IV or cluster vars. It only requires that any IV first stage (if there is one) comes after the final | slot. vcov is a regular argument and follows specific defaults (e.g. automatic clustering in the presence of FEs).
  2. fixest has its ownconcise and efficient syntax for multiple estimations. Going through purrr (or lapply) like Patrick's post would still work. But you could just as easily do:
library(fixest)

data("Wages", package = "plm")

mods = feols(
    lwage ~ csw(exp, wks) | # csw => cross-stepwise regressions
        married,            # married FEs 
    data = Wages,           # dataset
    vcov = "hc1",           # HC1 SEs (default would be clustered by marriage) 
    fsplit = ~bluecol       # fsplit => run on full sample and then split by variable
)

# etable(mods, file = "table.tex", style.tex = style.tex("aer"))
etable(mods)
#>                              mods.1             mods.2             mods.3
#> Sample (bluecol)        Full sample        Full sample                 no
#> Dependent Var.:               lwage              lwage              lwage
#>                                                                          
#> exp              0.0070*** (0.0007) 0.0071*** (0.0007) 0.0116*** (0.0009)
#> wks                                  0.0043** (0.0014)                   
#> Fixed-Effects:   ------------------ ------------------ ------------------
#> married                         Yes                Yes                Yes
#> ________________ __________________ __________________ __________________
#> S.E. type        Heteroskedas.-rob. Heteroskedas.-rob. Heteroskedas.-rob.
#> Observations                  4,165              4,165              2,036
#> R2                          0.11001            0.11232            0.15969
#> Within R2                   0.02979            0.03230            0.07769
#> 
#>                              mods.4             mods.5             mods.6
#> Sample (bluecol)                 no                yes                yes
#> Dependent Var.:               lwage              lwage              lwage
#>                                                                          
#> exp              0.0117*** (0.0009) 0.0054*** (0.0008) 0.0055*** (0.0008)
#> wks              0.0079*** (0.0019)                       0.0012 (0.0017)
#> Fixed-Effects:   ------------------ ------------------ ------------------
#> married                         Yes                Yes                Yes
#> ________________ __________________ __________________ __________________
#> S.E. type        Heteroskedas.-rob. Heteroskedas.-rob. Heteroskedas.-rob.
#> Observations                  2,036              2,129              2,129
#> R2                          0.16643            0.15151            0.15174
#> Within R2                   0.08508            0.02369            0.02395
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Created on 2022-11-21 with reprex v2.0.2

astraetech commented 1 year ago

Hi Grant,

So I assume this specification won't work if FE, IV or CLUS are not needed?

fmla <- sprintf("%s ~ %s | %s | %s | %s", lhs, rhs, fe, iv, clus)

I was looking for flexibility in specifying parts of the formula. In Patrick's example if i don't specify either one of the FE, IV or CLUS, the default 0 option is used which means no FE, IV or CLUS employed.

I had to create something like this as I coudln't default (not applicable) values like in felm:

reg_feols_1 <- function(lhs, rhs, fe = "0", iv_incl=FALSE, iv = "0", clus_incl=FALSE, clus = "0", data) {
  #browser()
  #data <- eval(parse(text = data_str))
  if (iv_incl==TRUE) {
    fmla <- sprintf("%s ~ %s | %s | %s ", lhs, rhs, fe, iv)
  }
  if (iv_incl==FALSE) {fmla <- sprintf("%s ~ %s | %s  ", lhs, rhs, fe)
  }
  if (clus_incl==TRUE) { fit <- fixest::feols(as.formula(fmla), clus=clus, data = data)
  }
  fit <- fixest::feols(as.formula(fmla),  data = data)
}

Is this the best way to approach? I'm focusing just on the placeholder issue not on the issue of the running different specifications with purrr/apply families. Thank you for the code you wrote by the way! It is an interesting and useful example of building estimations with fixest.

grantmcdermott commented 1 year ago

Is this the best way to approach?

Nothing wrong with your general approach. But the idiomatic way to do this in fixest is by creating formula macros with xpd. (See also setFixest_fml.)

Also, specifying your vcov args (e.g. clustered SEs) at estimation time is less important for fixest models, since the SESs can be adjusted on-the-fly post estimation.

mod = feols(y ~ x, dat)
summary(mod, vcov = "iid")
summary(mod, vcov = "hc1")
summary(mod, vcov = ~cl)
# etc
astraetech commented 1 year ago

Is this the best way to approach?

Nothing wrong with your general approach. But the idiomatic way to do this in fixest is by creating formula macros with xpd. (See also setFixest_fml.)

Also, specifying your vcov args (e.g. clustered SEs) at estimation time is less important for fixest models, since the SESs can be adjusted on-the-fly post estimation.

mod = feols(y ~ x, dat)
summary(mod, vcov = "iid")
summary(mod, vcov = "hc1")
summary(mod, vcov = ~cl)
# etc

Thank you for the reply! The xpd is a very powerful tool indeed.

mods = feols(
    lwage ~ csw(exp, wks) | # csw => cross-stepwise regressions
        married,            # married FEs 
    data = Wages,           # dataset
    vcov = "hc1",           # HC1 SEs (default would be clustered by marriage) 
    fsplit = ~bluecol       # fsplit => run on full sample and then split by variable
)

I have a question about your example and the expand.grid example in xpd. I want to test all combinations of FEs indep varscluster vars including the combinations where there are no FEs or cluster vars which would lead the expand.grid to assign empty spaces to the FE or cluster vars slots as in

# We first create a matrix with all possible combinations of variables
my_args = [lapply](https://rdrr.io/r/base/lapply.html)([names](https://rdrr.io/r/base/names.html)(base)[-(1:2)], function(x) [c](https://rdrr.io/r/base/c.html)("", x))
(all_combs = [as.matrix](https://rdrr.io/r/base/matrix.html)([do.call](https://rdrr.io/r/base/do.call.html)("expand.grid", my_args)))
#>      Var1 Var2 Var3     
#> [1,] ""   ""   ""       
#> [2,] "x2" ""   ""       
#> [3,] ""   "x3" ""       
#> [4,] "x2" "x3" ""       
#> [5,] ""   ""   "species"
#> [6,] "x2" ""   "species"
#> [7,] ""   "x3" "species"
#> [8,] "x2" "x3" "species"

Those empty spaces would not be interepreted correctly by feols right? A formula with "" after a | would not treat that as no FE but that something is wrong? Apologies if i'm mistaken.

grantmcdermott commented 1 year ago

Hi @astraetech,

Those empty spaces would not be interepreted correctly by feols right? A formula with "" after a | would not treat that as no FE but that something is wrong? Apologies if i'm mistaken.

Yes, I think that's right. xpd can handle FE variables after the | slot, but only if the input is a formula (e.g. ,xpd(y ~ x1 + ..v, ..v = ~ 1 | x2)). In the expand.grid example above, the input is a character vector. BUT...

Remember that this example was just demonstrating how you could mimic the inbuilt (cross) stepwise functionality. SO you can achieve the same result in a simple one-liner.

mods = feols(y ~ x1 + csw0(x2, x3, species), base)

# equivalent, but now passing species through the FE slot
mods = feols(y ~ x1 + csw0(x2, x3) | sw0(species), base)

If I'm not mistaken, this gives you the full set combinations you are looking for.

PS. I've already touched on this, but specifying your cluster variable at run time is not necessary for fixest. (TBC there are some advantages of doing so, but mostly related to keeping your model object as lean as possible.) You could just as easily supply a bespoke adjustment across models with one of the package's post-estimation functions. For example:

etable(mods, vcov = list('each', 'iid', ~species))  # report both IID and clustered (by species) SEs for each model