Closed ghost closed 4 years ago
Hi, thanks for the question: it's always good to explicit implicit choices!
In all honesty, no, I don't happen to know a reference. I did define the adj-R2 as I thought had more sense, but it hasn't strong theoretical roots.
In the reference you mention, I don't see however a strong justification for not adjusting for the number of FEs.
With the adjustment, we're closer to the adjusted-R2 of the projected model than without adjustment. Following on your example, here's the adj-R2 of the projected model:
library(data.table)
base = as.data.table(Grunfeld)
base[, c("inv_m", "value_m", "capital_m") := .(mean(inv), mean(value), mean(capital)), by = firm]
base[, c("inv_dm", "value_dm", "capital_dm") := .(inv - inv_m, value - value_m, capital - capital_m)]
# Estimation on the demeaned variables
res_dm = feols(inv_dm ~ -1 + value_dm + capital_dm, base)
r2(res_dm, "ar2")
# 0.7655796
And the previous adj-R2 seems natural, no?
Apart from similarity across software (which is already an important point), do you have strong objections in using it? Or any suggestion?
I am not aware of any literature either. Stata does not make it explicit, neither does lfe
. I just thought there is some kind of convension around this but maybe it is just a coincident both implementations just apply the normal formula for adj. R^2 to within models. The manual for gretl mentions there is no clear definition for an adj. R^2 for within models which is why the authors abstain from an attempt to calculate it.
Currently, I do not have a feeling about that is "more correct" for calculation of adj. R^2 for FE. Maybe something like the ratio of R^2/adj. R^2 should be similar for the OLS and FE cases for a range of parameters could serve as a reference (~= making adj. R^2 for FE imposing a similar panelty for additional model parameters as in the OLS case). Or the reference you suggested to the projected model. Maybe both approaches coincide or lead to similar suggestions.
An observation about the projected model's adj. R^2: summary.lm
gives a different result than feols
+ r2
for your example. Without investigating, I would assume this is due to summary.lm
taking special care of the non-intercept case:
print(summary(lm(inv_dm ~ 0 + value_dm + capital_dm, data = base)), 16)
Multiple R-squared: 0.7667575837481406, Adjusted R-squared: 0.7644015997455966
It makes me think that I don't detail it in the help pages, and I'll update that so it will remove confusion.
By the way, you were right on the cause of the difference with the lm
ar2! It's indeed the adjustment for the absence of intercept.
It's so corner-case.. but I may fix it so the two are aligned.
Anyway, thanks for raising the topic!
Hi, I finally corrected the small differences in adj. R2 when there is no intercept. I also added in the details section how the adjustment is done. The new release should come soon. Thanks for the comments, I'm closing then.
Likely not an issue, rather a question: Do you happen to know a reference for the adjusted R2 of the within model? I have this line in mind:
res[i] = 1 - cpp_ssq(resid(x)) / ssr_fe_only * (n - nb_fe) / (n - nb_fe - df_k)
Stata seems to calculate the adj. R2 in the within case as per the "standard" formula, see this post where someone reworked the Stata formula used: Stata forum
For the one-way fixed effect on Grunfeld data (200 obs) I get with
fixest
:With package
lfe
I get a different adjusted R2:felm
seems to use the usual formula for the adjusted R2, without any adjustment in the numerator for the (missing) intercept or absorbed effects.