The stove package provides functions for ML modeling. Packages from the Tidymodels were used, but they were configured to be easy for ML beginners to use. Although it belongs to statgarten whose packages are incorporated in shiny app, stove package also can be used for itself in console.
# install.packages("devtools")
devtools::install_github("statgarten/stove")
# remotes::install_github("statgarten/datatoys")
library(stove)
library(datatoys)
library(dplyr)
set.seed(1234)
cleaned_data <- datatoys::bloodTest
cleaned_data <- cleaned_data %>%
mutate_at(vars(SEX, ANE, IHD, STK), factor) %>%
mutate(TG = ifelse(TG < 150, 0, 1)) %>%
mutate_at(vars(TG), factor) %>%
group_by(TG) %>%
sample_n(500) # TG(0):TG(1) = 500:500
target_var <- "TG"
train_set_ratio <- 0.7
seed <- 1234
formula <- paste0(target_var, " ~ .")
# Split data
split_tmp <- stove::trainTestSplit(data = cleaned_data,
target = target_var,
prop = train_set_ratio,
seed = seed
)
data_train <- split_tmp[[1]] # train data
data_test <- split_tmp[[2]] # test data
data_split <- split_tmp[[3]] # whole data with split information
# Define preprocessing recipe for cross validation
rec <- stove::prepForCV(data = data_train,
formula = formula,
imputation = T,
normalization = T,
seed = seed
)
# User input
mode <- "classification"
algo <- "logisticRegression" # Custom name
engine <- "glmnet" # glmnet (default)
v <- 2
metric <- "roc_auc" # roc_auc (default), accuracy
gridNum <- 5
iter <- 10
seed <- 1234
# Modeling using logistic regression algorithm
finalized <- stove::logisticRegression(
algo = algo,
engine = engine,
mode = mode,
trainingData = data_train,
splitedData = data_split,
formula = formula,
rec = rec,
v = v,
gridNum = gridNum,
iter = iter,
metric = metric,
seed = seed
)
You can compare several models' performance and visualize them.\ These documents contain the example codes for modeling workflow using stove.
When training an ML model, the amount of data required depends on the complexity of the task you want to solve or the complexity of the learning algorithm. 'stove' does not support the training process without cross-validation. We recommend training the model with data having at least 1,000 rows.
Copyright :copyright: 2022 Yeonchan Seong This project is MIT licensed