rsquaredacademy / olsrr

Tools for developing OLS regression models
https://olsrr.rsquaredacademy.com/
Other
102 stars 22 forks source link

Problems with subset regression #195

Closed CarlosJara72 closed 2 years ago

CarlosJara72 commented 2 years ago

I create an "lm" object and use ols_regress function for more detalis and the resuts are diferent betwen the summary and the function

regr<-lm(iris$Sepal.Length~iris$Sepal.Width,subset = iris$Species=="versicolor")
summary(regr)
ols_regress(regr)
aravindhebbali commented 2 years ago

Let me look into this and get back to you.

aravindhebbali commented 2 years ago

Hi.. the difference in the results is due to the way ols_regress() extracts the data from the specified model. Below is a workaround:

# load package
library(olsrr)
#> 
#> Attaching package: 'olsrr'
#> The following object is masked from 'package:datasets':
#> 
#>     rivers

# data
data <- iris[iris$Species == "versicolor", ]

# regress
regr <- lm(Sepal.Length ~ Sepal.Width, data)

# compare output
# base R
summary(regr)
#> 
#> Call:
#> lm(formula = Sepal.Length ~ Sepal.Width, data = data)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.73497 -0.28556 -0.07544  0.43666  0.83805 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   3.5397     0.5629   6.289 9.07e-08 ***
#> Sepal.Width   0.8651     0.2019   4.284 8.77e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.4436 on 48 degrees of freedom
#> Multiple R-squared:  0.2766, Adjusted R-squared:  0.2615 
#> F-statistic: 18.35 on 1 and 48 DF,  p-value: 8.772e-05

# olsrr
ols_regress(regr)
#>                         Model Summary                          
#> --------------------------------------------------------------
#> R                       0.526       RMSE                0.435 
#> R-Squared               0.277       MSE                 0.197 
#> Adj. R-Squared          0.262       Coef. Var           7.473 
#> Pred R-Squared          0.207       AIC                64.564 
#> MAE                     0.361       SBC                70.300 
#> --------------------------------------------------------------
#>  RMSE: Root Mean Square Error 
#>  MSE: Mean Square Error 
#>  MAE: Mean Absolute Error 
#>  AIC: Akaike Information Criteria 
#>  SBC: Schwarz Bayesian Criteria 
#> 
#>                                ANOVA                                
#> -------------------------------------------------------------------
#>                Sum of                                              
#>               Squares        DF    Mean Square      F         Sig. 
#> -------------------------------------------------------------------
#> Regression      3.611         1          3.611    18.352     1e-04 
#> Residual        9.444        48          0.197                     
#> Total          13.055        49                                    
#> -------------------------------------------------------------------
#> 
#>                                 Parameter Estimates                                  
#> ------------------------------------------------------------------------------------
#>       model     Beta    Std. Error    Std. Beta      t       Sig     lower    upper 
#> ------------------------------------------------------------------------------------
#> (Intercept)    3.540         0.563                 6.289    0.000    2.408    4.671 
#> Sepal.Width    0.865         0.202        0.526    4.284    0.000    0.459    1.271 
#> ------------------------------------------------------------------------------------

Created on 2022-04-18 by the reprex package (v0.3.0)

CarlosJara72 commented 2 years ago

Thank you for answer so fast.

I sugest that when you use "lm" object, in the ols_regress function, use the same data than the object "lm". In this way you can obtain the same output, a "short output" with summary(object) and one "complete output" with ols_regress.

I understand what you do, you subset de data first and then apply the function because when you subset into de function the results are different.

Thank you again!

aravindhebbali commented 2 years ago

Thanks.. we will definitely consider your suggestion.