tidymodels / rsample

Classes and functions to create and summarize resampling objects
https://rsample.tidymodels.org
Other
341 stars 66 forks source link

rsample a boot on a green background

R-CMD-check Codecov test
coverage CRAN_Status_Badge Downloads lifecycle

Overview

The rsample package provides functions to create different types of resamples and corresponding classes for their analysis. The goal is to have a modular set of methods that can be used for:

The scope of rsample is to provide the basic building blocks for creating and analyzing resamples of a data set, but this package does not include code for modeling or calculating statistics. The Working with Resample Sets vignette gives a demonstration of how rsample tools can be used when building models.

Note that resampled data sets created by rsample are directly accessible in a resampling object but do not contain much overhead in memory. Since the original data is not modified, R does not make an automatic copy.

For example, creating 50 bootstraps of a data set does not create an object that is 50-fold larger in memory:

library(rsample)
library(mlbench)

data(LetterRecognition)
lobstr::obj_size(LetterRecognition)
#> 2,644,640 B

set.seed(35222)
boots <- bootstraps(LetterRecognition, times = 50)
lobstr::obj_size(boots)
#> 6,686,776 B

# Object size per resample
lobstr::obj_size(boots)/nrow(boots)
#> 133,735.5 B

# Fold increase is <<< 50
as.numeric(lobstr::obj_size(boots)/lobstr::obj_size(LetterRecognition))
#> [1] 2.528426

Created on 2022-02-28 by the reprex package (v2.0.1)

The memory usage for 50 bootstrap samples is less than 3-fold more than the original data set.

Installation

To install it, use:

install.packages("rsample")

And the development version from GitHub with:

# install.packages("pak")
pak::pak("rsample")

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.