Hi,
I am trying to do feature selection using glmStepAIC with the following code.
control <- trainControl(method="repeatedcv", number=10, repeats=3)
step_train <- train(grouping ~., data = train_data,
method = "glmStepAIC", direction = "forward", family = "binomial",
trControl = control)
My question is how the feature selection and cross validation are conducted. Is stepwise selection done for each fold and the features in step_train$finalModel are those with highest votes? If so, the performance metric shown in step_train$resample is the performance of each fold with different sets of features? It matters because I need to know if I should do another cross validation to evaluate performance of the selected features (like below).
features <- names(step_train$finalModel$coefficients)[-1]
step_eval <- train(grouping ~., data = train_data[, c(features, "grouping") ],
method = "glm", family = "binomial",
trControl = control)
Hi, I am trying to do feature selection using glmStepAIC with the following code.
My question is how the feature selection and cross validation are conducted. Is stepwise selection done for each fold and the features in step_train$finalModel are those with highest votes? If so, the performance metric shown in step_train$resample is the performance of each fold with different sets of features? It matters because I need to know if I should do another cross validation to evaluate performance of the selected features (like below).
Thanks!