lrberge / fixest

Fixed-effects estimations
https://lrberge.github.io/fixest/
377 stars 59 forks source link

Make functions work with data.table (#433) #435

Closed kylebutts closed 7 months ago

kylebutts commented 1 year ago

Did a complete look thru of the code and there's only a few places that cause problems when df is a data.table. The primary culprit is df[, cols] when cols is a character/integer vector of columns to select.

Base R has subset(df, select = cols) which works with data.frame and data.table. Some people don't like subset b/c it's weird with NSE (e.g. http://adv-r.had.co.nz/Computing-on-the-language.html#subset), but that doesn't apply with the select argument.

For testing, I added as_df_type <- as.data.frame in the tests file. You can toggle it as_df_type <- data.table::as.data.table to test if everything works with data.table and likewise as_df_type <- tibble::as_tibble for tibble. All the tests pass.

lrberge commented 7 months ago

Thanks a lot for the effort Kyle!!!

But unfortunately subset is a no go because internally it makes a deep copy, while the regular DF indexing creates a shallow copy.

data(trade)
microbenchmark::microbenchmark(
    subset = subset(trade, select = c("Euros", "dist_km")),
    regular_df = trade[, c("Euros", "dist_km")]
)
#> Unit: microseconds
#>        expr    min      lq     mean  median      uq     max neval
#>      subset 1093.7 1640.25 1871.248 1691.15 1809.40 16558.0   100
#>  regular_df   11.4   12.60   16.551   16.80   19.05    49.2   100

And conversion to DF is also a shallow copy (just pointers copy which is fast). Thanks again and sorry for the delay.