spsanderson / healthyR.ai

healthyR.ai - AI package for the healthyverse
http://www.spsanderson.com/healthyR.ai/
Other
16 stars 6 forks source link

hai_cubist_data_prepper() #250

Closed spsanderson closed 2 years ago

spsanderson commented 2 years ago

Function:

#' Prep Data for Cubist - Recipe
#'
#' @family Preprocessor
#' @family cubist
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @details This function will automatically prep your data.frame/tibble for
#' use in the cubist algorithm. The cubist algorithm is for regression only.
#'
#' This function will output a recipe specification.
#'
#' @description Automatically prep a data.frame/tibble for use in the cubist algorithm.
#' 
#' @seealso \url{https://rulequest.com/cubist-info.html}
#'
#' @param .data The data that you are passing to the function. Can be any type
#' of data that is accepted by the `data` parameter of the `recipes::reciep()`
#' function.
#' @param .recipe_formula The formula that is going to be passed. For example
#' if you are using the `diamonds` data then the formula would most likely be something
#' like `price ~ .`
#'
#' @examples
#' hai_cubist_data_prepper(.data = diamonds, .recipe_formula = price ~ .)
#' rec_obj <- hai_cubist_data_prepper(diamonds, price ~ .)
#' get_juiced_data(rec_obj)
#'
#' @return
#' A recipe object
#'
#' @export
#'

hai_cubist_data_prepper <- function(.data, .recipe_formula){

  # Recipe ---
  rec_obj <- recipes::recipe(.recipe_formula, data = .data) %>%
    recipes::step_zv(recipes::all_predictors())

  # Return ----
  return(rec_obj)

}

Example:

> hai_cubist_data_prepper(.data = diamonds, .recipe_formula = price ~ .)
Recipe

Inputs:

      role #variables
   outcome          1
 predictor          9

Operations:

Zero variance filter on recipes::all_predictors()
> rec_obj <- hai_cubist_data_prepper(diamonds, price ~ .)
> get_juiced_data(rec_obj)
# A tibble: 53,940 x 10
   carat cut       color clarity depth table     x     y     z price
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <dbl> <dbl> <dbl> <int>
 1  0.23 Ideal     E     SI2      61.5    55  3.95  3.98  2.43   326
 2  0.21 Premium   E     SI1      59.8    61  3.89  3.84  2.31   326
 3  0.23 Good      E     VS1      56.9    65  4.05  4.07  2.31   327
 4  0.29 Premium   I     VS2      62.4    58  4.2   4.23  2.63   334
 5  0.31 Good      J     SI2      63.3    58  4.34  4.35  2.75   335
 6  0.24 Very Good J     VVS2     62.8    57  3.94  3.96  2.48   336
 7  0.24 Very Good I     VVS1     62.3    57  3.95  3.98  2.47   336
 8  0.26 Very Good H     SI1      61.9    55  4.07  4.11  2.53   337
 9  0.22 Fair      E     VS2      65.1    61  3.87  3.78  2.49   337
10  0.23 Very Good H     VS1      59.4    61  4     4.05  2.39   338
# ... with 53,930 more rows