rsquaredacademy / olsrr

Tools for developing OLS regression models
https://olsrr.rsquaredacademy.com/
Other
102 stars 22 forks source link

ols_step_forward_p(...) problem using many potential predictors #199

Open hirscht opened 2 years ago

hirscht commented 2 years ago

I´m using a large number of potential predictors (larger than the number of predictand values used) with ols_step_forward_p(...) to select the predictors that fulfill the condition set by the p value. This works for most of the cases when p is set to a relatively low value but sometimes I have an error:

Error in Anova.lm(m) : residual sum of squares is 0 (within rounding error)

This happens when too many potential predictors fulfill the condition set by the p-value, resulting in being selected by the method, which eventially leads to the residual sum of sqaures being zero (Rsquared=1.000). In this case, I only get the Error message above and the result of the selection before the error is lost (not stored in a ols_step_forward_p class object). Would it be possible to store the calculated values even if there is this error? What I would need the most is the list of the selected predictors (in the order of the selection).

FwMod <- ols_step_forward_p(LinMod, pent = 0.1, progress=TRUE, details=TRUE )
aravindhebbali commented 2 years ago

Hi @hirscht, is it possible to share the data for debugging?

hirscht commented 2 years ago

Unfortunately not. But what I can do is, to insert the last part of the message that I get directly before the error, by using details=TRUE in the function. As you can see, predictor176 (V176) is the last that is selected in step 67 (The maximum number of predictors that could be selected is 69 since I have 70 observations). At the section "Parameter estimates" in the message I get all the predictors selected so far (here I show only the first and the last one). Then the error comes and all of this info is unfortunately not stored in the object where it should be.

Forward Selection: Step 67 

+ V176 

                        Model Summary                         
-------------------------------------------------------------
R                       1.000       RMSE               0.000 
R-Squared               1.000       Coef. Var          0.000 
Adj. R-Squared          1.000       MSE                0.000 
Pred R-Squared          1.000       MAE                0.000 
-------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                                    ANOVA                                      
------------------------------------------------------------------------------
                    Sum of                                                    
                   Squares        DF    Mean Square         F            Sig. 
------------------------------------------------------------------------------
Regression    27438349.309        67     409527.602    4040865085942.93    0.0000 
Residual             0.000         2          0.000                           
Total         27438349.309        69                                          
------------------------------------------------------------------------------

                                           Parameter Estimates                                            
---------------------------------------------------------------------------------------------------------
      model          Beta    Std. Error    Std. Beta         t          Sig          lower         upper 
---------------------------------------------------------------------------------------------------------
(Intercept)    -19274.197         0.584                  -32995.677    0.000    -19276.710    -19271.683 
       V156        40.156         0.000        0.074     284504.288    0.000        40.155        40.157 
       .... 
       ....
       ....
       V176         0.000         0.000        0.000         11.908    0.007         0.000         0.000 
---------------------------------------------------------------------------------------------------------

Error in Anova.lm(m) : 
  residual sum of squares is 0 (within rounding error)
aravindhebbali commented 2 years ago

Let me look into this and get back to you.