haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
5.99k stars 1.12k forks source link

[Feature proposal]: Check for and handle negative RSquared in LinearModel #705

Closed gaimetti closed 2 years ago

gaimetti commented 2 years ago

Is your feature request related to a problem? Please describe. When performing OLS regression, I cannot catch the specific case for negative RSquared. This is because in the constructor of the LinearModel class there is no check for negative RSquared. Currently, if the model performs worse than the mean of the target variable, then an IllegalArgumentException is thrown later when trying to calculate the p-value in Beta.regularizedIncompleteBetaFunction():

if (x < 0.0 || x > 1.0) { throw new IllegalArgumentException("Invalid x: " + x); }

Describe the solution you'd like When trying to fit a linear model, I would like to handle the case of an invalid model by checking for negative RSquared. One solution would be to throw a custom exception such as InvalidModelException if RSquared < 0 in the constructor of LinearModel.

Describe alternatives you've considered Currently, I'm simply catching the IllegalArgumentException that is thrown when calculating the p-value in Beta.regularizedIncompleteBetaFunction(). However, I cannot be sure that this is because of negative RSquared or some other error which I would need to handle differently.

Additional context I'm using Smile 2.6.0 in Java.

haifengl commented 2 years ago

Negative RSquared doesn't mean that the model is invalid. You don't have to wait for p-value calculation to throw exception. After training the model, you can check RSquared() immediately and act accordingly.

gaimetti commented 2 years ago

@haifengl, thanks for the quick reply!

Negative RSquared doesn't mean that the model is invalid.

Yes, you're right that a negative RSquared doesn't mean an invalid model, just a very poor one :-)

You don't have to wait for p-value calculation to throw exception. After training the model, you can check RSquared() immediately and act accordingly.

OK, but an attempt to calculate p-value is performed already within smile.regression.LinearModel constructor just after RSquared is calculated, so I can't check it within my code after training the model or am I missing something?

For reference, I'm using the following fit method to train my model: model = smile.regression.OLS.fit(formula, dataFrame);

haifengl commented 2 years ago

I can skip p-value calculation and set it to NaN if RSquared is negative.

gaimetti commented 2 years ago

I can skip p-value calculation and set it to NaN if RSquared is negative.

I can certainly work with that :-)

haifengl commented 2 years ago

It is in master branch now.