phetsims / curve-fitting

"Curve Fitting" is an educational simulation in HTML5, by PhET Interactive Simulations.
GNU General Public License v3.0
6 stars 3 forks source link

edge case for rSquare #86

Closed veillette closed 5 years ago

veillette commented 8 years ago

In cases where the points share all the same y value, an issue arises with the calculation of r Square. In this case the best fit line is a horizontal line. However the value of r square is undefined. r^2=1 - (numerator/denominator) where the denominator vanishes.

FYI, the denominator is essentially the y variance of the points and is equal to zero if all the points fall onto a horizontal line. The numerator for the best fit line is equal to zero as well. In the case of adjustable fit, the numerator can range from 0 to a positive number.

How should we handle this case? Here are three possible options:

For reference, here is how is was handled in linear regression. https://github.com/phetsims/least-squares-regression/issues/63

ariel-phet commented 8 years ago

@veillette I think if we can handle it the same way we did in linear regression would be the most consistent.

veillette commented 8 years ago

Thanks @ariel-phet .

For the record, the flash version of curve fitting uses the second approach. r^2 is zero for best fit and adjustable fit and the read out indicates zero.

In any case, we will need to check when the variance,
(yyAverage - yAverage * yAverage)) is equal to zero.

SaurabhTotey commented 5 years ago

As of right now, it seems like r^2 is 1 for a perfectly horizontal line for both best fit and adjustable fit. However, for adjustable fit, it seems like the r^2 is always 1 regardless of how well the adjustable fit actually models the points. In adjustable fit, I can get situations where the r^2 is 1 and the X^2 value is 4325. I will spend some time trying to get the behavior where r^2 will be 0 for both best and adjustable fits.

SaurabhTotey commented 5 years ago

This issue is now seemingly fixed in 008a420. It now sets the r^2 to 0 whenever all the points are perfectly horizontal. r^2 is set to 0 for both adjustable fit and best fit.

SaurabhTotey commented 5 years ago

I previously hadn't realized that the flash simulation and the least squares regression sim handled this case differently. As such, I made the behaviour of this sim match that of least squares regression. Now, for perfectly horizontal points, there is no readout for r^2. r^2 only has a value (that can still be 0) whenever the points are not exactly horizontal. This solution most closely resembles the 3rd solution posed by @veillette originally, where the only difference is that I use NaN internally rather than null because that is what is described in https://github.com/phetsims/least-squares-regression/issues/63.