lrberge / fixest

Fixed-effects estimations
https://lrberge.github.io/fixest/
378 stars 61 forks source link

Calculating F-Statistic With Clustered SEs #448

Open yixinsun1216 opened 1 year ago

yixinsun1216 commented 1 year ago

Hi there, I wanted to first say that I absolutely love this package and I use it all the time!

In running some IV regressions, I noticed that my first stage f-statistic seems way too big to be plausible. After playing around with the functions, I think it is because of how f-stats are generally calculated in feols when errors are clustered. I'll start with an example that is just looking at a first-stage regression:

library(fixest)
library(lfe)
library(dplyr)

set.seed(123)

N <- 1000

data <- tibble(z = sample(c(0, 1), N, replace = TRUE),
               x = 3*z + rnorm(N, 0, .5), 
               group = rep(1:4, N / 4),
               e = rnorm(N, 0, 1), 
               y = 5*x + group + e)  %>%
  mutate(group = as.factor(group))

###################################
# first stage - no clustering of SE
###################################
felm(x ~ z | group | 0 | group, data) %>% summary(robust = FALSE)
# Call:
#   felm(formula = x ~ z | group | 0 | group, data = data) 
# 
# Residuals:
#   Min       1Q   Median       3Q      Max 
# -1.87775 -0.34345  0.00024  0.33371  1.83426 
# 
# Coefficients:
#   Estimate Std. Error t value   Pr(>|t|)    
# z  2.98577    0.03139   95.12 0.00000256 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.4957 on 995 degrees of freedom
# Multiple R-squared(full model): 0.9012   Adjusted R-squared: 0.9008 
# Multiple R-squared(proj model): 0.9009   Adjusted R-squared: 0.9005 
# F-statistic(full model): 2268 on 4 and 995 DF, p-value: < 0.00000000000000022 
# F-statistic(proj model):  9048 on 1 and 3 DF, p-value: 0.000002561 

feols(x ~ z | group, data, sd = "iid") %>% etable(fitstat = ~wf)
# Dependent Var.:                    x
# 
# z                  2.986*** (0.0430)
# Fixed-Effects:     -----------------
#   group                            Yes
# __________________ _________________
# S.E.: Clustered            by: group
# F-test (projected)           9,048.0

Comparing feols and felm, the within F-stats are both 9,048 without any clustering.

Then I run analogous regressions, but cluster by the group.

###################################
# first stage - with clustering 
###################################
felm(x ~ z | group | 0 | group, data) %>% summary(robust = TRUE)
# Call:
#   felm(formula = x ~ z | group | 0 | group, data = data) 
# 
# Residuals:
#   Min       1Q   Median       3Q      Max 
# -1.87775 -0.34345  0.00024  0.33371  1.83426 
# 
# Coefficients:
#   Estimate Cluster s.e. t value   Pr(>|t|)    
# z  2.98577      0.04297   69.49 0.00000657 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.4957 on 995 degrees of freedom
# Multiple R-squared(full model): 0.9012   Adjusted R-squared: 0.9008 
# Multiple R-squared(proj model): 0.9009   Adjusted R-squared: 0.9005 
# F-statistic(full model, *iid*): 2268 on 4 and 995 DF, p-value: < 0.00000000000000022 
# F-statistic(proj model):  4829 on 1 and 3 DF, p-value: 0.000006568 

feols(x ~ z | group, data, cluster = ~group) %>% etable(fitstat = ~wf, coefstat = "tstat")
# Dependent Var.:                   x
# 
# z                  2.986*** (69.49)
# Fixed-Effects:     ----------------
#   group                           Yes
# __________________ ________________
# VCOV: Clustered           by: group
# F-test (projected)          9,048.0

# tstat^2 should be equal to F stat in this one regressor case -> 69.49^2 = 4828.86

We see that the f-statistic from feols now stays the same, but the f-statistic from felm is 4829. This matches the square of the t-statistic from both the feols and felm regressions, means felm is calculating the f-statistic correctly.

The same issue occurs when I run IV regressions, although just with the cragg-donaldson calculation. ivwald and kpr seems to be correct. Code below:


###################################
# second stage - no clustering
###################################
feols(y ~ 1 | group | x ~ z, data, se = "hetero") %>% etable(fitstat = ~ivf + ivwald + kpr)
felm(y ~ 1 | group | (x ~ z), data)$stage1 %>% summary()

###################################
# second stage - with clustering
###################################
feols(y ~ 1 | group | x ~ z, data, cluster = ~group) %>% etable(fitstat = ~ivf + ivwald + kpr)
felm(y ~ 1 | group | (x ~ z) | group, data)$stage1 %>% summary()

###################################
# kpr and wald returns the "right" f-stat
###################################
fitstat(feols(y ~ 1 | group | x ~ z, data) , "cd", cluster = ~group)
fitstat(feols(y ~ 1 | group | x ~ z, data) , "kpr", cluster = ~group)
fitstat(feols(y ~ 1 | group | x ~ z, data) , "ivwald", cluster = ~group)

Thanks!

WinMonroe commented 1 month ago

Just flagging that I've had similar issue with F-stats in IV regressions using this package (absolutely brilliant package otherwise)