Closed kylebutts closed 7 months ago
Thanks a lot for the effort Kyle!!!
But unfortunately subset
is a no go because internally it makes a deep copy, while the regular DF indexing creates a shallow copy.
data(trade)
microbenchmark::microbenchmark(
subset = subset(trade, select = c("Euros", "dist_km")),
regular_df = trade[, c("Euros", "dist_km")]
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> subset 1093.7 1640.25 1871.248 1691.15 1809.40 16558.0 100
#> regular_df 11.4 12.60 16.551 16.80 19.05 49.2 100
And conversion to DF is also a shallow copy (just pointers copy which is fast). Thanks again and sorry for the delay.
Did a complete look thru of the code and there's only a few places that cause problems when
df
is adata.table
. The primary culprit isdf[, cols]
whencols
is a character/integer vector of columns to select.Base R has
subset(df, select = cols)
which works with data.frame anddata.table
. Some people don't like subset b/c it's weird with NSE (e.g. http://adv-r.had.co.nz/Computing-on-the-language.html#subset), but that doesn't apply with theselect
argument.For testing, I added
as_df_type <- as.data.frame
in the tests file. You can toggle itas_df_type <- data.table::as.data.table
to test if everything works withdata.table
and likewiseas_df_type <- tibble::as_tibble
fortibble
. All the tests pass.