There’s a concept in R of an “analysis as a package”, in which everything you need for your data analysis is contained within a custom package. When you build the vignettes of this package, the data analysis is performed and results saved as a pretty HTML or PDF file, generated with R Markdown.
Can we extend this concept to a model as a package? In this case, the vignette trains a model and stores it in the package before it’s installed. To score new data, the package is loaded and the appropriate functions called. We get all the benefits of packages (roxygen, testthat) and we keep model training and model execution together but separate.
I whipped up a prototype over the last few days, using a simple sentiment analysis random forest model.
Is there value in doing this properly, maybe with some CI/CD for automated testing or using plumber to expose the scoring functions as an API? Can we create a “model as a package” template (but a good one)?
There’s a concept in R of an “analysis as a package”, in which everything you need for your data analysis is contained within a custom package. When you build the vignettes of this package, the data analysis is performed and results saved as a pretty HTML or PDF file, generated with R Markdown.
Can we extend this concept to a model as a package? In this case, the vignette trains a model and stores it in the package before it’s installed. To score new data, the package is loaded and the appropriate functions called. We get all the benefits of packages (roxygen, testthat) and we keep model training and model execution together but separate.
I whipped up a prototype over the last few days, using a simple sentiment analysis random forest model.
Is there value in doing this properly, maybe with some CI/CD for automated testing or using plumber to expose the scoring functions as an API? Can we create a “model as a package” template (but a good one)?