tidymodels / recipes

Pipeable steps for feature engineering and data preprocessing to prepare for modeling
https://recipes.tidymodels.org
Other
550 stars 106 forks source link

Refactor of implementation code of bake functions #797

Closed EmilHvitfeldt closed 11 months ago

EmilHvitfeldt commented 2 years ago

I talked to @topepo about this, and I'm going to put it into writing. I'm proposing a code refactor related to how bake()ing code is done. Most bake steps follow exact same patterns and it would be nice to have it cleaned up.

I'll show by example, so let us look at bake.step_bin2factor

https://github.com/tidymodels/recipes/blob/788d409b96587a3377b5846df93774dfce511726/R/bin2factor.R#L106-L117

This could be rewritten to:

bake.step_bin2factor <- function(object, new_data, ...) {

  for (variable in object$columns) {
    new_data[, variable] <- bin2factor_impl(
      x = getElement(new_data, variable),
      levels = object$levels,
      ref_first = object$ref_first
    )
  }

  new_data
}

bin2factor_impl <- function(x, levels, ref_first) {
  levs <- if (ref_first) levels else rev(levels)
  factor(
    ifelse(x == 1, yes = levels[1], no = levels[2]),
    levels = levs
  )
}

At first glance, it is more code. However the code inside bake.step_bin2factor() will be identical to other 1-in-1-out steps. Just change the name of the *_impl() function and pass in different arguments as needed. This change also allows for easier unit testing of the implementation of each step, separate from the {recipes} framework. I contemplated whether the *_impl() function should take a list of arguments or just passing in object. But I see it as a cleaner refactor to pass in individual arguments, as it will make the tests easier to manage.

This pattern will not work for all steps, as some steps add or delete columns. But similar patterns should be created for those scenarios.

List of steps that would need to be converted to follow this refactor (first pass)

EmilHvitfeldt commented 11 months ago

This has effectively been done in https://github.com/tidymodels/recipes/pull/1156

github-actions[bot] commented 10 months ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.