r-lib / rlang

Low-level API for programming with R
https://rlang.r-lib.org
Other
507 stars 139 forks source link

Optimize `new_data_mask()` and `r_alloc_environment()` #1553

Closed DavisVaughan closed 1 year ago

DavisVaughan commented 1 year ago

This PR was motivated by this dplyr issue https://github.com/tidyverse/dplyr/issues/6666

If we are going to create a fresh data mask at each group evaluation, then mask creation needs to be very fast.

This PR helps in two ways:

library(rlang)

env <- new_environment()

# CRAN rlang
bench::mark(new_data_mask(env), iterations = 1000000)
#> # A tibble: 1 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new_data_mask(env)   2.08µs   2.94µs   299002.    3.48KB     18.2

# Just 100->10
bench::mark(new_data_mask(env), iterations = 1000000)
#> # A tibble: 1 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new_data_mask(env)      2µs   2.75µs   337047.     4.3KB     20.2

# Both 100->10, and `R_NewEnv()`
bench::mark(new_data_mask(env), iterations = 1000000)
#> # A tibble: 1 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new_data_mask(env)    980ns   1.39µs   646986.     4.3KB     14.9

I am hopeful that this means C calls to new_data_mask() in dplyr will be in the nanosecond range

We can probably also use this in vctrs if we wanted