rsquaredacademy / olsrr

Tools for developing OLS regression models
https://olsrr.rsquaredacademy.com/
Other
102 stars 22 forks source link

Extracting model data #159

Closed aravindhebbali closed 4 years ago

aravindhebbali commented 4 years ago

Use model$model to extract underlying data instead of eval(model$call$data)

jens-daniel-mueller commented 4 years ago

Running ols_step_all_possible() returns a data.frame with following columns:

"mindex" "n" "predictors" "rsquare" "adjr" "predrsq" "cp" "aic" "sbic" "sbc" "msep" "fpe" "apc" "hsp"

This does not match the output described in the manual, which lists:

n model number
predictors predictors in the model
rsquare rsquare of the model
adjr adjusted rsquare of the model
predrsq predicted rsquare of the model
cp mallow's Cp
aic akaike information criteria
sbic sawa bayesian information criteria
sbc schwarz bayes information criteria
gmsep estimated MSE of prediction, assuming multivariate normality
jp final prediction error
pc amemiya prediction criteria
sp hocking's Sp

Can this be corrected? Is it currently possible to extract the root mean square error (RMSE) for each model?

Thanks! Jens

aravindhebbali commented 4 years ago

Yes.. there is a mismatch between the column names and what is mentioned in the documentation. We will fix this. RMSE is not returned by ols_step_all_possible() in the current version but we can add it in the output.

jens-daniel-mueller commented 4 years ago

Cool, thanks! Just to give me an idea: Can you estimate how long it might take to add the RMSE output? It would be very helpful for me...

aravindhebbali commented 4 years ago

I have pushed the changes. Install the develop branch and you are good to go :smiley:

library(olsrr)
#> 
#> Attaching package: 'olsrr'
#> The following object is masked from 'package:datasets':
#> 
#>     rivers
model <- lm(mpg ~ disp + hp, data = mtcars);
k <- ols_step_all_possible(model);
k$result$rmse
#> [1] 3.251454 3.862962 3.126601

Created on 2020-07-27 by the reprex package (v0.3.0)

jens-daniel-mueller commented 4 years ago

Perfect, the solution seems to work and it's a big help for me. Thank you for implementing it so quickly! The only thing I noticed is that now

names(lm_all) [1] "result"

whereas before, the same command allowed to access the output names directly. I prefered the previous output structure, but I guess there is a reason for the change.

jens-daniel-mueller commented 4 years ago

Sorry, have to come back to this. After testing the rmse output of the ols_step_all_possible function against three ways to calculate it manually (see reprex below), it seems rather the residual standard deviation is calculated, and not the root mean squared error. Can you fix this again? Thanks a lot!


library(olsrr)
#> 
#> Attaching package: 'olsrr'
#> The following object is masked from 'package:datasets':
#> 
#>     rivers

# reproducing example from
# https://olsrr.rsquaredacademy.com/articles/variable_selection.html

model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
k <- ols_step_all_possible(model)

# Apply three approaches to calculate rmse as found on:
# https://stackoverflow.com/questions/43123462/how-to-obtain-rmse-out-of-lm-result

sqrt(sum((model$residuals)^2)/length(model$residuals))
#> [1] 2.408548
sqrt(mean(model$residuals^2))
#> [1] 2.408548

RSS <- c(crossprod(model$residuals))
MSE <- RSS / length(model$residuals)
RMSE <- sqrt(MSE)
RMSE
#> [1] 2.408548

# extract rmse from olsrr output

k$result[k$result$predictors == "disp hp wt qsec",]$rmse
#> [1] 2.622095

# calculate the residual standard deviation

sigma(model)
#> [1] 2.622095

Created on 2020-07-28 by the reprex package (v0.3.0)

aravindhebbali commented 4 years ago

Let me look into this and get back to you.

aravindhebbali commented 4 years ago
library(olsrr)
#> 
#> Attaching package: 'olsrr'
#> The following object is masked from 'package:datasets':
#> 
#>     rivers
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
k <- ols_step_all_possible(model);
k$result$rmse
#>  [1] 2.949163 3.148207 3.740297 5.387066 2.468854 2.471485 2.776478 2.976436
#>  [9] 3.130180 3.574623 2.411297 2.468493 2.471479 2.941023 2.408548

Created on 2020-07-28 by the reprex package (v0.3.0)

aravindhebbali commented 4 years ago

Perfect, the solution seems to work and it's a big help for me. Thank you for implementing it so quickly! The only thing I noticed is that now

names(lm_all) [1] "result"

whereas before, the same command allowed to access the output names directly. I prefered the previous output structure, but I guess there is a reason for the change.

We are doing a complete review and redesign of olsrr API to make it as user friendly as possible. We will take into account your feedback regarding the output names in this case 😃