Closed aravindhebbali closed 4 years ago
Running ols_step_all_possible()
returns a data.frame with following columns:
"mindex" "n" "predictors" "rsquare" "adjr" "predrsq" "cp" "aic" "sbic" "sbc" "msep" "fpe" "apc" "hsp"
This does not match the output described in the manual, which lists:
n | model number |
---|---|
predictors | predictors in the model |
rsquare | rsquare of the model |
adjr | adjusted rsquare of the model |
predrsq | predicted rsquare of the model |
cp | mallow's Cp |
aic | akaike information criteria |
sbic | sawa bayesian information criteria |
sbc | schwarz bayes information criteria |
gmsep | estimated MSE of prediction, assuming multivariate normality |
jp | final prediction error |
pc | amemiya prediction criteria |
sp | hocking's Sp |
Can this be corrected? Is it currently possible to extract the root mean square error (RMSE) for each model?
Thanks! Jens
Yes.. there is a mismatch between the column names and what is mentioned in the documentation. We will fix this. RMSE is not returned by ols_step_all_possible()
in the current version but we can add it in the output.
Cool, thanks! Just to give me an idea: Can you estimate how long it might take to add the RMSE output? It would be very helpful for me...
I have pushed the changes. Install the develop branch and you are good to go :smiley:
library(olsrr)
#>
#> Attaching package: 'olsrr'
#> The following object is masked from 'package:datasets':
#>
#> rivers
model <- lm(mpg ~ disp + hp, data = mtcars);
k <- ols_step_all_possible(model);
k$result$rmse
#> [1] 3.251454 3.862962 3.126601
Created on 2020-07-27 by the reprex package (v0.3.0)
Perfect, the solution seems to work and it's a big help for me. Thank you for implementing it so quickly! The only thing I noticed is that now
names(lm_all) [1] "result"
whereas before, the same command allowed to access the output names directly. I prefered the previous output structure, but I guess there is a reason for the change.
Sorry, have to come back to this.
After testing the rmse output of the ols_step_all_possible
function against three ways to calculate it manually (see reprex below), it seems rather the residual standard deviation is calculated, and not the root mean squared error.
Can you fix this again? Thanks a lot!
library(olsrr)
#>
#> Attaching package: 'olsrr'
#> The following object is masked from 'package:datasets':
#>
#> rivers
# reproducing example from
# https://olsrr.rsquaredacademy.com/articles/variable_selection.html
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
k <- ols_step_all_possible(model)
# Apply three approaches to calculate rmse as found on:
# https://stackoverflow.com/questions/43123462/how-to-obtain-rmse-out-of-lm-result
sqrt(sum((model$residuals)^2)/length(model$residuals))
#> [1] 2.408548
sqrt(mean(model$residuals^2))
#> [1] 2.408548
RSS <- c(crossprod(model$residuals))
MSE <- RSS / length(model$residuals)
RMSE <- sqrt(MSE)
RMSE
#> [1] 2.408548
# extract rmse from olsrr output
k$result[k$result$predictors == "disp hp wt qsec",]$rmse
#> [1] 2.622095
# calculate the residual standard deviation
sigma(model)
#> [1] 2.622095
Created on 2020-07-28 by the reprex package (v0.3.0)
Let me look into this and get back to you.
library(olsrr)
#>
#> Attaching package: 'olsrr'
#> The following object is masked from 'package:datasets':
#>
#> rivers
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
k <- ols_step_all_possible(model);
k$result$rmse
#> [1] 2.949163 3.148207 3.740297 5.387066 2.468854 2.471485 2.776478 2.976436
#> [9] 3.130180 3.574623 2.411297 2.468493 2.471479 2.941023 2.408548
Created on 2020-07-28 by the reprex package (v0.3.0)
Perfect, the solution seems to work and it's a big help for me. Thank you for implementing it so quickly! The only thing I noticed is that now
names(lm_all) [1] "result"
whereas before, the same command allowed to access the output names directly. I prefered the previous output structure, but I guess there is a reason for the change.
We are doing a complete review and redesign of olsrr API to make it as user friendly as possible. We will take into account your feedback regarding the output names in this case 😃
Use
model$model
to extract underlying data instead ofeval(model$call$data)