spsanderson / healthyR.ai

healthyR.ai - AI package for the healthyverse
http://www.spsanderson.com/healthyR.ai/
Other
16 stars 6 forks source link

hai_c50_data_prepper() #244

Closed spsanderson closed 2 years ago

spsanderson commented 2 years ago

Function:

#' Prep Data for C5.0 - Recipe
#'
#' @family Preprocessor
#' @family C5.0
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @details This function will automatically prep your data.frame/tibble for
#' use in the C5.0 algorithm. The C5.0 algorithm is a lazy learning classification
#' algorithm. It expects data to be presented in a certain fashion.
#'
#' This function will output a recipe specification.
#'
#' @description Automatically prep a data.frame/tibble for use in the C5.0 algorithm.
#' 
#' @seealso \url{https://www.rulequest.com/see5-unix.html}
#'
#' @param .data The data that you are passing to the function. Can be any type
#' of data that is accepted by the `data` parameter of the `recipes::reciep()`
#' function.
#' @param .recipe_formula The formula that is going to be passed. For example
#' if you are using the `iris` data then the formula would most likely be something
#' like `Species ~ .`
#'
#' @examples
#' hai_c50_data_prepper(.data = Titanic, .recipe_formula = Survived ~ .)
#' rec_obj <- hai_c50_data_prepper(Titanic, Survived ~ .)
#' get_juiced_data(rec_obj)
#'
#' @return
#' A recipe object
#'
#' @export
#'

hai_c50_data_prepper <- function(.data, .recipe_formula){

  # Recipe ---
  rec_obj <- recipes::recipe(.recipe_formula, data = .data) %>%
    recipes::step_string2factor(tidyselect::vars_select_helpers$where(is.character))

  # Return ----
  return(rec_obj)

}

Examples:

> hai_c50_data_prepper(.data = Titanic, .recipe_formula = Survived ~ .)
Recipe

Inputs:

      role #variables
   outcome          1
 predictor          4

Operations:

Factor variables from tidyselect::vars_select_helpers$where(is.character)
> rec_obj <- hai_c50_data_prepper(Titanic, Survived ~ .)
> get_juiced_data(rec_obj)
# A tibble: 32 x 5
   Class Sex    Age       n Survived
   <fct> <fct>  <fct> <dbl> <fct>   
 1 1st   Male   Child     0 No      
 2 2nd   Male   Child     0 No      
 3 3rd   Male   Child    35 No      
 4 Crew  Male   Child     0 No      
 5 1st   Female Child     0 No      
 6 2nd   Female Child     0 No      
 7 3rd   Female Child    17 No      
 8 Crew  Female Child     0 No      
 9 1st   Male   Adult   118 No      
10 2nd   Male   Adult   154 No      
# ... with 22 more rows