Open rsangole opened 6 years ago
good point, How cool that he liked it! (he's on twitter now finally! ) I guess he'll do a better job. Let me think about it for this weekend, I guess we need to decide if we kill this project, or make a minimal version.
Recipes do some of that but not everything. the syntax that I envision will be something like this:
vars_and_preproc <- recipe(response ~ ., data = dat) %>%
step_knnimpute(all_predictors()) %>%
step_scale(all_predictors(), -all_nominal()) %>%
step_pca(all_numeric(), -all_outcomes(),
num = varying())
model_spec <- rand_forest(
trees = 1000,
min_n = varying(),
mtry = varying()
)
# or
model_spec <- surv_reg(distribution = varying())
filter <- pval_filter(all_predictors(), alpha = 0.01)
model_spec <- pipeline() %>%
add(vars_and_preproc) %>%
add(model_spec) %>%
add(filter)
# `pipeline` detects what what is varying (if any)
There are "pipelines" everywhere now so maybe I'll change that name.
parsnip
is supposed to be the unified model interface (the rand_forest
part above) and the methods that collect the operations into a pipeline might be in a separate package.
Hey @RMHogervorst , it's been a while since we've worked on this.
I was at the 2018 rstudio conference last week and I took a 2-day workshop by @topepo in which he showcased the
caret
pacakge along with his newer package -recipes
which is an excellent paradigm on creating pipelined flows for analyses. If you haven't seen it yet, do check it out. It works in the tidyverse framework.I showed him my blogpost and he was intrigued by it.
Given the tidyverse + recipe + (in future, he'll have a new package called parsnip), I think we need to rethink how we structure this package, what it's goals are and if it'll still add value.