Extracting model data - Githubissues

aravindhebbali commented 4 years ago

Use model$model to extract underlying data instead of eval(model$call$data)

jens-daniel-mueller commented 4 years ago

Running ols_step_all_possible() returns a data.frame with following columns:

"mindex" "n" "predictors" "rsquare" "adjr" "predrsq" "cp" "aic" "sbic" "sbc" "msep" "fpe" "apc" "hsp"

This does not match the output described in the manual, which lists:

n	model number
predictors	predictors in the model
rsquare	rsquare of the model
adjr	adjusted rsquare of the model
predrsq	predicted rsquare of the model
cp	mallow's Cp
aic	akaike information criteria
sbic	sawa bayesian information criteria
sbc	schwarz bayes information criteria
gmsep	estimated MSE of prediction, assuming multivariate normality
jp	final prediction error
pc	amemiya prediction criteria
sp	hocking's Sp

Can this be corrected? Is it currently possible to extract the root mean square error (RMSE) for each model?

Thanks! Jens

aravindhebbali commented 4 years ago

Yes.. there is a mismatch between the column names and what is mentioned in the documentation. We will fix this. RMSE is not returned by ols_step_all_possible() in the current version but we can add it in the output.

jens-daniel-mueller commented 4 years ago

Cool, thanks! Just to give me an idea: Can you estimate how long it might take to add the RMSE output? It would be very helpful for me...

aravindhebbali commented 4 years ago

I have pushed the changes. Install the develop branch and you are good to go :smiley:

library(olsrr)
#> 
#> Attaching package: 'olsrr'
#> The following object is masked from 'package:datasets':
#> 
#>     rivers
model <- lm(mpg ~ disp + hp, data = mtcars);
k <- ols_step_all_possible(model);
k$result$rmse
#> [1] 3.251454 3.862962 3.126601

^{Created on 2020-07-27 by the reprex package (v0.3.0)}

jens-daniel-mueller commented 4 years ago

Perfect, the solution seems to work and it's a big help for me. Thank you for implementing it so quickly! The only thing I noticed is that now

names(lm_all) [1] "result"

whereas before, the same command allowed to access the output names directly. I prefered the previous output structure, but I guess there is a reason for the change.

jens-daniel-mueller commented 4 years ago

Sorry, have to come back to this. After testing the rmse output of the ols_step_all_possible function against three ways to calculate it manually (see reprex below), it seems rather the residual standard deviation is calculated, and not the root mean squared error. Can you fix this again? Thanks a lot!


library(olsrr)
#> 
#> Attaching package: 'olsrr'
#> The following object is masked from 'package:datasets':
#> 
#>     rivers

# reproducing example from
# https://olsrr.rsquaredacademy.com/articles/variable_selection.html

model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
k <- ols_step_all_possible(model)

# Apply three approaches to calculate rmse as found on:
# https://stackoverflow.com/questions/43123462/how-to-obtain-rmse-out-of-lm-result

sqrt(sum((model$residuals)^2)/length(model$residuals))
#> [1] 2.408548
sqrt(mean(model$residuals^2))
#> [1] 2.408548

RSS <- c(crossprod(model$residuals))
MSE <- RSS / length(model$residuals)
RMSE <- sqrt(MSE)
RMSE
#> [1] 2.408548

# extract rmse from olsrr output

k$result[k$result$predictors == "disp hp wt qsec",]$rmse
#> [1] 2.622095

# calculate the residual standard deviation

sigma(model)
#> [1] 2.622095

^{Created on 2020-07-28 by the reprex package (v0.3.0)}

aravindhebbali commented 4 years ago

Let me look into this and get back to you.

aravindhebbali commented 4 years ago

library(olsrr)
#> 
#> Attaching package: 'olsrr'
#> The following object is masked from 'package:datasets':
#> 
#>     rivers
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
k <- ols_step_all_possible(model);
k$result$rmse
#>  [1] 2.949163 3.148207 3.740297 5.387066 2.468854 2.471485 2.776478 2.976436
#>  [9] 3.130180 3.574623 2.411297 2.468493 2.471479 2.941023 2.408548

^{Created on 2020-07-28 by the reprex package (v0.3.0)}

aravindhebbali commented 4 years ago

Perfect, the solution seems to work and it's a big help for me. Thank you for implementing it so quickly! The only thing I noticed is that now

names(lm_all) [1] "result"

whereas before, the same command allowed to access the output names directly. I prefered the previous output structure, but I guess there is a reason for the change.

We are doing a complete review and redesign of olsrr API to make it as user friendly as possible. We will take into account your feedback regarding the output names in this case 😃

rsquaredacademy / olsrr

Extracting model data #159