stevenpawley / recipeselectors

Additional recipes for supervised feature selection to be used with the tidymodels recipes package
https://stevenpawley.github.io/recipeselectors/
Other
54 stars 6 forks source link

Any way to pull which predictors are chosen for the final model? #7

Closed echelleburns closed 2 years ago

echelleburns commented 2 years ago

Super excited to be able to try out recipeselectors in some of my work. I'd like to use the recipeselectors::step_select_forests() in a rand_forest() machine learning model. I'm under the impression that recipeselectors::step_select_forests() will alter the final model so that only predictors that exceed a particular scoring threshold (or have the n highest scores) will be used in the final model. Is this correct? If so, is there any way for us to see which predictors are actually selected?

I know that recipeselectors::pull_importances() can give us an idea of which features are most important (including factors within a predictor), but this seems to be different than what I'm actually after.

stevenpawley commented 2 years ago

Hello! The best way of getting this information would be to use the tidy method on the recipe. For example, in a recipe where we are using top_p to select the top two predictors:

library(recipes)
library(tibble)
library(parsnip)
library(recipeselectors)

data("iris")

rec <- iris %>%
  recipe(Species ~.) %>%
  step_select_forests(
    all_predictors(),
    outcome = "Species",
    engine = "ranger",
    top_p = 2
  )

prepped <- prep(rec)
tidy(prepped, number = 1)

The tidy(prepped, number = 1) will summarize the parameters for the first recipe step into a tibble. For stepselect* steps, the tibble will contain the terms that were removed by the step. Unfortunately I haven't got around to writing any real documentation yet, although this approach of summarizing the basics from each recipe step is described in https://recipes.tidymodels.org/reference/tidy.recipe.html.

The feature importance scores are also stored in the prepped recipe step as prepped$steps[[1]]$scores, although this requires digging into the structure of the step. However, I should probably think more about making the tidy method more informative, I'm sure it could be useful to have more information summarized, other than a table of the excluded terms.

echelleburns commented 2 years ago

Amazing, this works perfectly! Thank you so much!