rsangole / tidyrabbit

MIT License
1 stars 0 forks source link

Does this still add value? #11

Open rsangole opened 6 years ago

rsangole commented 6 years ago

Hey @RMHogervorst , it's been a while since we've worked on this.

I was at the 2018 rstudio conference last week and I took a 2-day workshop by @topepo in which he showcased the caret pacakge along with his newer package - recipes which is an excellent paradigm on creating pipelined flows for analyses. If you haven't seen it yet, do check it out. It works in the tidyverse framework.

I showed him my blogpost and he was intrigued by it.

Given the tidyverse + recipe + (in future, he'll have a new package called parsnip), I think we need to rethink how we structure this package, what it's goals are and if it'll still add value.

RMHogervorst commented 6 years ago

good point, How cool that he liked it! (he's on twitter now finally! ) I guess he'll do a better job. Let me think about it for this weekend, I guess we need to decide if we kill this project, or make a minimal version.

topepo commented 6 years ago

Recipes do some of that but not everything. the syntax that I envision will be something like this:

vars_and_preproc <- recipe(response ~ ., data = dat) %>%
  step_knnimpute(all_predictors()) %>%
  step_scale(all_predictors(), -all_nominal()) %>%
  step_pca(all_numeric(), -all_outcomes(), 
           num = varying())

model_spec <- rand_forest(
  trees = 1000,
  min_n = varying(), 
  mtry = varying()
)
# or 
model_spec <- surv_reg(distribution = varying())

filter <- pval_filter(all_predictors(), alpha = 0.01)

model_spec <- pipeline() %>%
  add(vars_and_preproc) %>%
  add(model_spec) %>%
  add(filter)

# `pipeline` detects what what is varying (if any)

There are "pipelines" everywhere now so maybe I'll change that name.

parsnip is supposed to be the unified model interface (the rand_forest part above) and the methods that collect the operations into a pipeline might be in a separate package.