jacob-long / jtools

Tools for summarizing/visualizing regressions and other helpful stuff
https://jtools.jacob-long.com
GNU General Public License v3.0
162 stars 22 forks source link

svyglm - Adjusted R^2 negative because of wrong number of degrees of freedom in summ #112

Closed lindnemi closed 2 years ago

lindnemi commented 3 years ago

For survey designs with clustering summ returns incorrect (and negative) values of adjusted R^2. This is because the degrees of freedom of the null model are assumed to be N-1. However, that is not the case in a design with stratification and clustering. For details see : https://notstatschat.rbind.io/2019/06/26/denominator-degrees-of-freedom-in-svyglm/

The line of code with the incorrect formula is here: https://github.com/jacob-long/jtools/blob/8829c067b17cecc26082965703d833cfa750a5c9/R/summ.R#L1151. Below is a MWE and a demonstration of how to calculate adjusted R^2 in that case.

library("jtools")
library("survey")

data(api)
dclus2 <- svydesign(id=~dnum+snum, weights=~pw, data=apiclus2)

L <- svyglm(api00~ell+meals+mobility, design=dclus2)
summ(L) # adjusted R squared is negative!

# How I computed adjusted R^2  (hopefully correct, better double check)

S <- summary(L)
N <- summary(svyglm(api00~1, design=dclus2))
adjRsq = 1 - (S$dispersion / (S$df.residual)) / (N$dispersion / (N$df.residual))
show(adjRsq[1])
jacob-long commented 2 years ago

The more I think about this, the more I wonder whether I should be offering an adjusted R-squared for survey models at all. When I try to search for more information, it seems that experts are skeptical of the idea. Stata is now saying their calculation of R-squared is itself comparable to an adjusted R-squared (how their calculation relates to mine, I do not know). For now I'm pushing a version like your calculation but the whole thing has me scratching my head.