ropensci / ozunconf19

OzUnconf19
http://ozunconf19.ropensci.org/
21 stars 5 forks source link

Executable models as packages #27

Open mdneuzerling opened 4 years ago

mdneuzerling commented 4 years ago

There’s a concept in R of an “analysis as a package”, in which everything you need for your data analysis is contained within a custom package. When you build the vignettes of this package, the data analysis is performed and results saved as a pretty HTML or PDF file, generated with R Markdown.

Can we extend this concept to a model as a package? In this case, the vignette trains a model and stores it in the package before it’s installed. To score new data, the package is loaded and the appropriate functions called. We get all the benefits of packages (roxygen, testthat) and we keep model training and model execution together but separate.

I whipped up a prototype over the last few days, using a simple sentiment analysis random forest model.

Is there value in doing this properly, maybe with some CI/CD for automated testing or using plumber to expose the scoring functions as an API? Can we create a “model as a package” template (but a good one)?

mdneuzerling commented 4 years ago

I also feel like this is crying out for packrat/renv (what are people using these days?) for reproducible working environments.

stephstammel commented 4 years ago

OH THIS WOULD BE SO AMAZING