Closed jonkeane closed 3 years ago
I confirmed that this works locally by adding an option and running benchmarks with/without altrep disabled in the setup (and before/after the commit that enabled altreps here).
# A tibble: 16 × 5
# Groups: lib_path, source [8]
lib_path source use_altrep_setup process_median real_median
<chr> <chr> <lgl> <dbl> <dbl>
1 remote-apache/arrow@425b1cb type_dict FALSE 0.111 0.0290
2 remote-apache/arrow@425b1cb type_dict TRUE 0.111 0.0298
3 remote-apache/arrow@HEAD type_dict FALSE 0.104 0.0275
4 remote-apache/arrow@HEAD type_dict TRUE 0.104 0.0271
5 remote-apache/arrow@425b1cb type_floats FALSE 0.0179 0.00496
6 remote-apache/arrow@425b1cb type_floats TRUE 0.0267 0.0129
7 remote-apache/arrow@HEAD type_floats FALSE 0.0170 0.00465
8 remote-apache/arrow@HEAD type_floats TRUE 0.105 0.105
9 remote-apache/arrow@425b1cb type_integers FALSE 0.0107 0.00363
10 remote-apache/arrow@425b1cb type_integers TRUE 0.0198 0.0108
11 remote-apache/arrow@HEAD type_integers FALSE 0.0104 0.00344
12 remote-apache/arrow@HEAD type_integers TRUE 0.0921 0.0924
13 remote-apache/arrow@425b1cb type_strings FALSE 0.638 0.640
14 remote-apache/arrow@425b1cb type_strings TRUE 0.630 0.633
15 remote-apache/arrow@HEAD type_strings FALSE 0.603 0.604
16 remote-apache/arrow@HEAD type_strings TRUE 0.615 0.616
The critical change is that the @HEAD is slower when use_altrep_setup = TRUE
and faster when use_altrep_setup = FALSE
(and the faster is ~ the same as before altrep was around @425b1cb). For example, for floats:
lib_path source use_altrep_setup process_median real_median
<chr> <chr> <lgl> <dbl> <dbl>
1 remote-apache/arrow@425b1cb type_floats FALSE 0.0179 0.00496
2 remote-apache/arrow@425b1cb type_floats TRUE 0.0267 0.0129
3 remote-apache/arrow@HEAD type_floats FALSE 0.0170 0.00465
4 remote-apache/arrow@HEAD type_floats TRUE 0.105 0.105
When we enabled altrep, the df to table benchmarks started taking longer. This is because the setup code here would create dataframes that were backed by altreps to arrow arrays and didn't fully convert them to R vectors. Then when we converted from R->arrow we were now actually measuring both arrow->R and R->arrow.