tidyverse / multidplyr

A dplyr backend that partitions a data frame over multiple processes
https://multidplyr.tidyverse.org
Other
641 stars 75 forks source link

Implementation of `slice_sample()` for bootstrapping? #149

Open raka-everton opened 1 year ago

raka-everton commented 1 year ago

I was wondering if "slice_sample()" from tidyverse could be implemented in multidplyr please?

I have to bootstrap per group a very large dataset (~20 million observations) and it would be great to implement in this! Otherwise R just keeps crashing when I try to do it in one big batch.

hadley commented 11 months ago

Thanks for the suggestion! Will definitely consider it when I'm next working on multidplyr.