Closed echelleburns closed 2 years ago
Hello! The best way of getting this information would be to use the tidy
method on the recipe. For example, in a recipe where we are using top_p
to select the top two predictors:
library(recipes)
library(tibble)
library(parsnip)
library(recipeselectors)
data("iris")
rec <- iris %>%
recipe(Species ~.) %>%
step_select_forests(
all_predictors(),
outcome = "Species",
engine = "ranger",
top_p = 2
)
prepped <- prep(rec)
tidy(prepped, number = 1)
The tidy(prepped, number = 1)
will summarize the parameters for the first recipe step into a tibble. For stepselect* steps, the tibble will contain the terms that were removed by the step. Unfortunately I haven't got around to writing any real documentation yet, although this approach of summarizing the basics from each recipe step is described in https://recipes.tidymodels.org/reference/tidy.recipe.html.
The feature importance scores are also stored in the prepped recipe step as prepped$steps[[1]]$scores
, although this requires digging into the structure of the step. However, I should probably think more about making the tidy
method more informative, I'm sure it could be useful to have more information summarized, other than a table of the excluded terms.
Amazing, this works perfectly! Thank you so much!
Super excited to be able to try out
recipeselectors
in some of my work. I'd like to use therecipeselectors::step_select_forests()
in arand_forest()
machine learning model. I'm under the impression thatrecipeselectors::step_select_forests()
will alter the final model so that only predictors that exceed a particular scoring threshold (or have the n highest scores) will be used in the final model. Is this correct? If so, is there any way for us to see which predictors are actually selected?I know that
recipeselectors::pull_importances()
can give us an idea of which features are most important (including factors within a predictor), but this seems to be different than what I'm actually after.