topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.61k stars 632 forks source link

features selected using train function and glmStepAIC method #1287

Open yupingz opened 2 years ago

yupingz commented 2 years ago

Hi, I am trying to do feature selection using glmStepAIC with the following code.

control <- trainControl(method="repeatedcv", number=10, repeats=3)
step_train <- train(grouping ~., data = train_data,
               method = "glmStepAIC", direction = "forward", family = "binomial",
                trControl = control)

My question is how the feature selection and cross validation are conducted. Is stepwise selection done for each fold and the features in step_train$finalModel are those with highest votes? If so, the performance metric shown in step_train$resample is the performance of each fold with different sets of features? It matters because I need to know if I should do another cross validation to evaluate performance of the selected features (like below).

features <- names(step_train$finalModel$coefficients)[-1]
step_eval <- train(grouping ~., data = train_data[, c(features, "grouping") ],
               method = "glm", family = "binomial",
                trControl = control)

Thanks!