tidyverse / dtplyr

Data table backend for dplyr
https://dtplyr.tidyverse.org
Other
670 stars 57 forks source link

Upsampling a data.frame with dtplyr fails #403

Open johnF-moore opened 1 year ago

johnF-moore commented 1 year ago

I want to upsample a data.frame with replacement to go from say 10 rows to 15 rows. I can do this easily with dplyr::slice_sample() and data.table; however, dtplyr does not return a data.frame that is larger than the initial input. It returns 10 rows instead of 15.

Thus, adding lazy_dt() to my dplyr workflow returned a different result.

library(dplyr, warn.conflicts = FALSE)
library(dtplyr, warn.conflicts = FALSE)
library(data.table, warn.conflicts = FALSE)

small_iris <- head(iris, n = 10) 
sample_size <- 15

## Upsampling with replacement using data.table works
upsampled_dt <- as.data.table(small_iris)[sample(.N, 
                                                 sample_size, 
                                                 replace = TRUE)]
nrow(upsampled_dt)
#> [1] 15

## Upsampling with replacement using dplyr works
upsampled_dplyr <- small_iris %>% 
  slice_sample(n = sample_size,
               replace = TRUE) 
nrow(upsampled_dplyr)
#> [1] 15

## Upsampling with replacement using dtplyr fails
upsampled_dtplyr <- small_iris %>% 
  lazy_dt() %>% 
  slice_sample(n = sample_size, 
               replace = TRUE) %>% 
  as_tibble()
nrow(upsampled_dtplyr)
#> [1] 10

Am I missing anything? I'm using R-4.0.3 on Ubuntu, and the dev version of dtplyr.

eutwt commented 1 year ago

Thanks for the report! This is a bug.