Hi there, I wanted to first say that I absolutely love this package and I use it all the time!
In running some IV regressions, I noticed that my first stage f-statistic seems way too big to be plausible. After playing around with the functions, I think it is because of how f-stats are generally calculated in feols when errors are clustered. I'll start with an example that is just looking at a first-stage regression:
library(fixest)
library(lfe)
library(dplyr)
set.seed(123)
N <- 1000
data <- tibble(z = sample(c(0, 1), N, replace = TRUE),
x = 3*z + rnorm(N, 0, .5),
group = rep(1:4, N / 4),
e = rnorm(N, 0, 1),
y = 5*x + group + e) %>%
mutate(group = as.factor(group))
###################################
# first stage - no clustering of SE
###################################
felm(x ~ z | group | 0 | group, data) %>% summary(robust = FALSE)
# Call:
# felm(formula = x ~ z | group | 0 | group, data = data)
#
# Residuals:
# Min 1Q Median 3Q Max
# -1.87775 -0.34345 0.00024 0.33371 1.83426
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# z 2.98577 0.03139 95.12 0.00000256 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.4957 on 995 degrees of freedom
# Multiple R-squared(full model): 0.9012 Adjusted R-squared: 0.9008
# Multiple R-squared(proj model): 0.9009 Adjusted R-squared: 0.9005
# F-statistic(full model): 2268 on 4 and 995 DF, p-value: < 0.00000000000000022
# F-statistic(proj model): 9048 on 1 and 3 DF, p-value: 0.000002561
feols(x ~ z | group, data, sd = "iid") %>% etable(fitstat = ~wf)
# Dependent Var.: x
#
# z 2.986*** (0.0430)
# Fixed-Effects: -----------------
# group Yes
# __________________ _________________
# S.E.: Clustered by: group
# F-test (projected) 9,048.0
Comparing feols and felm, the within F-stats are both 9,048 without any clustering.
Then I run analogous regressions, but cluster by the group.
###################################
# first stage - with clustering
###################################
felm(x ~ z | group | 0 | group, data) %>% summary(robust = TRUE)
# Call:
# felm(formula = x ~ z | group | 0 | group, data = data)
#
# Residuals:
# Min 1Q Median 3Q Max
# -1.87775 -0.34345 0.00024 0.33371 1.83426
#
# Coefficients:
# Estimate Cluster s.e. t value Pr(>|t|)
# z 2.98577 0.04297 69.49 0.00000657 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.4957 on 995 degrees of freedom
# Multiple R-squared(full model): 0.9012 Adjusted R-squared: 0.9008
# Multiple R-squared(proj model): 0.9009 Adjusted R-squared: 0.9005
# F-statistic(full model, *iid*): 2268 on 4 and 995 DF, p-value: < 0.00000000000000022
# F-statistic(proj model): 4829 on 1 and 3 DF, p-value: 0.000006568
feols(x ~ z | group, data, cluster = ~group) %>% etable(fitstat = ~wf, coefstat = "tstat")
# Dependent Var.: x
#
# z 2.986*** (69.49)
# Fixed-Effects: ----------------
# group Yes
# __________________ ________________
# VCOV: Clustered by: group
# F-test (projected) 9,048.0
# tstat^2 should be equal to F stat in this one regressor case -> 69.49^2 = 4828.86
We see that the f-statistic from feols now stays the same, but the f-statistic from felm is 4829. This matches the square of the t-statistic from both the feols and felm regressions, means felm is calculating the f-statistic correctly.
The same issue occurs when I run IV regressions, although just with the cragg-donaldson calculation. ivwald and kpr seems to be correct. Code below:
###################################
# second stage - no clustering
###################################
feols(y ~ 1 | group | x ~ z, data, se = "hetero") %>% etable(fitstat = ~ivf + ivwald + kpr)
felm(y ~ 1 | group | (x ~ z), data)$stage1 %>% summary()
###################################
# second stage - with clustering
###################################
feols(y ~ 1 | group | x ~ z, data, cluster = ~group) %>% etable(fitstat = ~ivf + ivwald + kpr)
felm(y ~ 1 | group | (x ~ z) | group, data)$stage1 %>% summary()
###################################
# kpr and wald returns the "right" f-stat
###################################
fitstat(feols(y ~ 1 | group | x ~ z, data) , "cd", cluster = ~group)
fitstat(feols(y ~ 1 | group | x ~ z, data) , "kpr", cluster = ~group)
fitstat(feols(y ~ 1 | group | x ~ z, data) , "ivwald", cluster = ~group)
Hi there, I wanted to first say that I absolutely love this package and I use it all the time!
In running some IV regressions, I noticed that my first stage f-statistic seems way too big to be plausible. After playing around with the functions, I think it is because of how f-stats are generally calculated in feols when errors are clustered. I'll start with an example that is just looking at a first-stage regression:
Comparing
feols
andfelm
, the within F-stats are both 9,048 without any clustering.Then I run analogous regressions, but cluster by the group.
We see that the f-statistic from
feols
now stays the same, but the f-statistic fromfelm
is 4829. This matches the square of the t-statistic from both thefeols
andfelm
regressions, meansfelm
is calculating the f-statistic correctly.The same issue occurs when I run IV regressions, although just with the cragg-donaldson calculation.
ivwald
andkpr
seems to be correct. Code below:Thanks!