Q: reference for adjusted within R2

ghost commented 4 years ago

Likely not an issue, rather a question: Do you happen to know a reference for the adjusted R2 of the within model? I have this line in mind: res[i] = 1 - cpp_ssq(resid(x)) / ssr_fe_only * (n - nb_fe) / (n - nb_fe - df_k)

Stata seems to calculate the adj. R2 in the within case as per the "standard" formula, see this post where someone reworked the Stata formula used: Stata forum

For the one-way fixed effect on Grunfeld data (200 obs) I get with fixest:

data("Grunfeld", package = "plm")
fe <- feols(inv ~ value + capital | firm, data = Grunfeld)
r2(fe)
# war2 = 0.7642763

With package lfe I get a different adjusted R2:

summary(felm(inv ~ value + capital | firm, data = Grunfeld))
# Adjusted R-squared: 0.7531104211

felm seems to use the usual formula for the adjusted R2, without any adjustment in the numerator for the (missing) intercept or absorbed effects.

lrberge commented 4 years ago

Hi, thanks for the question: it's always good to explicit implicit choices!

In all honesty, no, I don't happen to know a reference. I did define the adj-R2 as I thought had more sense, but it hasn't strong theoretical roots.

In the reference you mention, I don't see however a strong justification for not adjusting for the number of FEs.

With the adjustment, we're closer to the adjusted-R2 of the projected model than without adjustment. Following on your example, here's the adj-R2 of the projected model:

library(data.table)
base = as.data.table(Grunfeld)
base[, c("inv_m", "value_m", "capital_m") := .(mean(inv), mean(value), mean(capital)), by = firm]
base[, c("inv_dm", "value_dm", "capital_dm") := .(inv - inv_m, value - value_m, capital - capital_m)]
# Estimation on the demeaned variables
res_dm = feols(inv_dm ~ -1 + value_dm + capital_dm, base)
r2(res_dm, "ar2")
# 0.7655796

And the previous adj-R2 seems natural, no?

Apart from similarity across software (which is already an important point), do you have strong objections in using it? Or any suggestion?

ghost commented 4 years ago

I am not aware of any literature either. Stata does not make it explicit, neither does lfe. I just thought there is some kind of convension around this but maybe it is just a coincident both implementations just apply the normal formula for adj. R^2 to within models. The manual for gretl mentions there is no clear definition for an adj. R^2 for within models which is why the authors abstain from an attempt to calculate it.

Currently, I do not have a feeling about that is "more correct" for calculation of adj. R^2 for FE. Maybe something like the ratio of R^2/adj. R^2 should be similar for the OLS and FE cases for a range of parameters could serve as a reference (~= making adj. R^2 for FE imposing a similar panelty for additional model parameters as in the OLS case). Or the reference you suggested to the projected model. Maybe both approaches coincide or lead to similar suggestions.

An observation about the projected model's adj. R^2: summary.lm gives a different result than feols + r2 for your example. Without investigating, I would assume this is due to summary.lm taking special care of the non-intercept case:

print(summary(lm(inv_dm ~ 0 + value_dm + capital_dm, data = base)), 16)
Multiple R-squared:  0.7667575837481406,    Adjusted R-squared:  0.7644015997455966

lrberge commented 4 years ago

It makes me think that I don't detail it in the help pages, and I'll update that so it will remove confusion.

By the way, you were right on the cause of the difference with the lm ar2! It's indeed the adjustment for the absence of intercept. It's so corner-case.. but I may fix it so the two are aligned.

Anyway, thanks for raising the topic!

lrberge commented 4 years ago

Hi, I finally corrected the small differences in adj. R2 when there is no intercept. I also added in the details section how the adjustment is done. The new release should come soon. Thanks for the comments, I'm closing then.

lrberge / fixest

Q: reference for adjusted within R2 #19