r-lib / rlang

Low-level API for programming with R
https://rlang.r-lib.org
Other
498 stars 136 forks source link

Introduce `r_wrap_or_clone()` and `r_wrap_or_copy()` #1599

Open DavisVaughan opened 1 year ago

DavisVaughan commented 1 year ago

These are aliases for R_shallow_duplicate_attr() and R_duplicate_attr() respectively

Added in 2018, with R 3.6.0, so we can use them without issues in all R versions we support except for 3.5, which we are dropping soon anyways https://github.com/wch/r-source/commit/bbcf6868752fcd5481e2379f93ccbe02911eea63

I've chosen these function names because they either "wrap" the object into an ALTREP wrapper if possible (i.e. if the type is supported and it is "large enough"), or they call their respective r_clone/copy() equivalents.

There are 2 places I use these helpers, shown below. They are only useful if you are tweaking attributes and not the main object. Otherwise you should just r_clone().


We now use r_wrap_or_clone_shared() in the re-encoding helper. If we re-encode an attribute, but not the main object, then we can "wrap" the main object before re-assigning the attributes to it.

x <- 1:1e6 + 0L
attr(x, "foo") <- `Encoding<-`("fa\xE7ile", "latin1")

vctrs:::is_altrep(x)
#> [1] TRUE

bench::mark(encode = .Call(rlang:::ffi_test_obj_encode_utf8, x))

# Main
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 encode        356µs    606µs     1186.    3.81MB     149.

# This PR
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 encode        1.1µs   1.53µs   499840.        0B        0

out <- .Call(rlang:::ffi_test_obj_encode_utf8, x)
vctrs:::is_altrep(out)

# Main
#> [1] FALSE

# This PR - now an altrep wrapper
#> [1] TRUE

We now also use r_wrap_or_clone() in set_names() to avoid having to call names<-, which is what we were doing before when I thought that was the only way to get access to the ALTREP wrappers. We now only call names<- if x is an "object" that might have an S3 method.

This gives us a little performance boost, but names<- was already very fast to call (typically in the 350ns range) so it doesn't look huge. It is relatively large percentage wise though, I guess.

I don't show any cases where x is an S3 object, that is typically in the 3.5us range because we have to do a length() dispatch call and a names<- dispatch call in case there is an S3 method registered, and those cases are completely separate from this PR. We could probably improve on this a little by actually checking to see if there is a length() or names<- method registered in the base/global env and only dispatching if we see one, but I didn't do that here.

library(rlang)

x <- 1:1e6 + 0L
names <- as.character(x)

# no names before, no names after
bench::mark(set_names(x, NULL), iterations = 1000000)
# before
#> # A tibble: 1 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 set_names(x, NULL)   1.12µs   1.38µs   695965.    3.26KB     16.7

# after
#> # A tibble: 1 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 set_names(x, NULL)    868ns   1.11µs   856415.     7.9KB     15.4

# no names before, names after
bench::mark(set_names(x, names), iterations = 1000000)

# before
#> # A tibble: 1 × 6
#>   expression               min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 set_names(x, names)   1.51µs   1.81µs   527374.        0B     18.5

# after
#> # A tibble: 1 × 6
#>   expression               min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 set_names(x, names)    1.2µs   1.52µs   636811.        0B     17.2

# this doesn't make an altrep wrapper, but `names<-` does
attr(x, "names") <- names

# names before, no names after
bench::mark(set_names(x, NULL), iterations = 1000000)

# before
#> # A tibble: 1 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 set_names(x, NULL)   1.32µs   1.61µs   598498.        0B     19.8

# after
#> # A tibble: 1 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 set_names(x, NULL)   1.03µs   1.34µs   720275.        0B     18.0

# names before, names after
bench::mark(set_names(x, names), iterations = 1000000)

# before
#> # A tibble: 1 × 6
#>   expression               min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 set_names(x, names)   1.59µs   1.92µs   501267.        0B     17.5

# after
#> # A tibble: 1 × 6
#>   expression               min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 set_names(x, names)   1.33µs   1.65µs   584423.        0B     15.8