p-values differ from Stata results, with clustered SEs. #19

sleeubc commented 5 years ago

Thank you for writing such a fantastic package. I like felm so much, and I use it instead of lm.

My coauthor found that p-values reported from felm differ from the ones from Stata when standard errors are clustered. This lead to different number of stars in our regression tables. In the example below, the coefficient estimates, SEs, t-values are all same but p-values for x are very different (0.0016 vs. 0.003).

I looked into this issue, and it seems that the degree of freedom used in p-value calculation are different. felm uses n-p where n is the number of observations and p is the number of regressors including constant and dummy variables, while Stata uses G-1 for clustered samples where G is the number of clusters.

You may find a discussion on page 23 of a paper by Cameron and Miller helpful:

n_cl <- 50
n_obs <- n_cl*10
cl <- rnorm(1:n_cl)
x <- rnorm(n_obs)
eps <- cl + rnorm(n_obs, 0, 10)
y <- x+eps
DF <- data.frame(cl, y, x)
model <- lfe::felm(y ~ x| 0 | 0| cl, data= DF )

#> Coefficients:
#>             Estimate Cluster s.e. t value Pr(>|t|)   
#> (Intercept)  -0.4771       0.3865  -1.234   0.2177   
#> x             1.2233       0.3854   3.174   0.0016 **
#> ---


. reg y x, vce(cluster cl)

y         Coef.         Std. Err.      t    P>t     [95% Conf. Interval]    
x          1.223281 .3854183     3.17   0.003     .4487536    1.997808
_cons   -.4771046   .3865388    -1.23   0.223    -1.253883    .2996742
karldw commented 5 years ago

Cross-ref with another degrees-of-freedom issue: #1