haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
5.99k stars 1.12k forks source link

Potential Bug in Adjusted R Squared #710

Closed stanfordstrickland closed 2 years ago

stanfordstrickland commented 2 years ago

Describe the bug Adjusted R Squared formula seems incorrect.

Expected behavior he adjusted r squared is defined as adj_r_2 = 1 - (1-r2)*(n-1)/(n-k-1) where k are the number of predictors (excluding the intercept).

Actual behavior he adjusted r squared is defined as adj_r_2 = 1 - (1-r2)*(n-1)/(n-k) where k are the number of predictors (excluding the intercept).

Code snippet In the LinearModel class

this.adjustedRSquared = 1.0D - (1.0D - this.RSquared) * (double)(n - 1) / (double)(n - this.p);

where

_this.p = X.ncols();

Additional context Perhaps its also worth noting that if an intercept is not included the formula for adjusted R squared is

adj_r_2 = 1−n*(1−r2)/(n−k)

where k are the number of predictors.

haifengl commented 2 years ago

In your reference formula, k is the number of predictors excluding bias term. In our implementation, it includes the bias term. So they are the same indeed.

stanfordstrickland commented 2 years ago

Perhaps I'm missing something. In LinearModel

this.adjustedRSquared = 1.0D - (1.0D - this.RSquared) * (double)(n - 1) / (double)(n - this.p); this.p = X.ncols();

I was under the impression that the X matrix is not padded with 1s to include the intercept. Rather the intercept is passed as the parameter b. Here is the constructor public LinearModel(Formula formula, StructType schema, Matrix X, double[] y, double[] w, double b). Therefore p in this instance will not include the intercept as it will only be the count of the X columns.

Might be worth pointing out that this is in the context of using RidgeRegression.