I want to upsample a data.frame with replacement to go from say 10 rows to 15 rows. I can do this easily with dplyr::slice_sample() and data.table; however, dtplyr does not return a data.frame that is larger than the initial input. It returns 10 rows instead of 15.
Thus, adding lazy_dt() to my dplyr workflow returned a different result.
library(dplyr, warn.conflicts = FALSE)
library(dtplyr, warn.conflicts = FALSE)
library(data.table, warn.conflicts = FALSE)
small_iris <- head(iris, n = 10)
sample_size <- 15
## Upsampling with replacement using data.table works
upsampled_dt <- as.data.table(small_iris)[sample(.N,
sample_size,
replace = TRUE)]
nrow(upsampled_dt)
#> [1] 15
## Upsampling with replacement using dplyr works
upsampled_dplyr <- small_iris %>%
slice_sample(n = sample_size,
replace = TRUE)
nrow(upsampled_dplyr)
#> [1] 15
## Upsampling with replacement using dtplyr fails
upsampled_dtplyr <- small_iris %>%
lazy_dt() %>%
slice_sample(n = sample_size,
replace = TRUE) %>%
as_tibble()
nrow(upsampled_dtplyr)
#> [1] 10
Am I missing anything? I'm using R-4.0.3 on Ubuntu, and the dev version of dtplyr.
I want to upsample a
data.frame
with replacement to go from say 10 rows to 15 rows. I can do this easily withdplyr::slice_sample()
anddata.table
; however,dtplyr
does not return adata.frame
that is larger than the initial input. It returns 10 rows instead of 15.Thus, adding
lazy_dt()
to mydplyr
workflow returned a different result.Am I missing anything? I'm using R-4.0.3 on Ubuntu, and the dev version of dtplyr.