Add ALTREP view support

[x] More tests?
[ ] Hear back about DATAPTR() usage in character implementation
[x] Extract_subset implementation? Or is the fallback good enough? (New approach is much faster even without Extract_subset due to faster Elt access with cached pointers, we really want base R to handle all the intricacies of this.)
[x] Use #defines to section off code that doesn't work in older R versions, possibly bump minimal rlang R version
[ ] Decide on r_init_library_with_dll() helper with Lionel
[ ] Support R >=3.6?

Then

[ ] Follow up PR to rename r_vec_resize() to r_vec_grow() and remove support for shrinking
- [ ] Use ALTREP views in r_dyn_unwrap() in this PR

Adds support for ALTREP contiguous views, i.e. vec_view(x, start, size), for the major R types

A few notes:

x gets marked as not mutable on the way in
No attributes are allowed on x right now
The ALTREP classes are registered at load time but need the dll, so I had to introduce a new r_init_library_with_dll() helper that goes in your R_init_<pkg>() function (good to know if you are utilize the rlang C API). The ALTREP classes are registered with the package name, so if vctrs uses the rlang C API then it will register its own ALTREP view classes under the "vctrs" package name. I think this is a good thing.
Serialization will force materialization, which I think is nice because it avoids us ever having to worry about "old" ALTREP view objects if we ever change the internal structure.

Here are important notes on the internal structure:

/*
Structure of a `view`:
- `data1` is:
  - the original vector, before materialization
  - the materialized vector, after materialization
- `data2` is a RAWSXP holding a `r_view_metadata`
*/

struct r_view_metadata {
  // Whether or not the ALTREP view has been materialized.
  bool materialized;

  // The offset into the original data to start at.
  r_ssize start;

  // The size of the view.
  r_ssize size;

  // A read only pointer into the data, to save indirection costs. We typically
  // set this upon view creation, unless `x` is ALTREP, in which case we delay
  // setting it until the first `DATAPTR_RO()` request, to be friendly to
  // ALTREP types that we wrap. After materialization, it is always set.
  const void* v_data_read;

  // A write only pointer into the data, to save indirection costs.
  // Always `NULL` before materialization, and it set at materialization time.
  // Never set for lists or character vectors, as write pointers are unsafe.
  void* v_data_write;
};

Benchmarking - The most notable thing in the benchmarks is that slicing with a large slice can be up to 2x slower than native code due to ExtractSubset() in base R using the *_Elt() ALTREP method to extract out each element. I think it could be much more efficient by using Dataptr_or_null() to first see if it could get a cheap readonly dataptr to the altrep data (it can here, it's a view) and then using that directly. I plan to submit that patch or talk to Luke about it, then this difference would poof go away entirely.

Also note that the difference is mostly apparent only when a contiguous slice is made with ExtractSubset(), where native objects probably have 0 cache misses, but we have to go through an ALTREP method each time. When you extract a random slice of the same size, the timings actually tend to converge and cache misses become more important.

Otherwise I think it's pretty darn good!

# R-devel at the time
getRversion()
#> [1] '4.5.0'

# load rlang
devtools::load_all(path = "~/files/r/packages/rlang/")

# Identical test
# Currently `identical()` forces materialization due to usage of `INTEGER()`
# rather than `INTEGER_RO()`, booooo.
x <- c(1, 2, 3, 4)
y <- vec_view(x, start = 1, size = 4)
identical(x, y)
#> [1] TRUE
view_is_materialized(y)
#> [1] TRUE

# Object size test
# I think lobstr is more accurate here, there is a small fixed amount of overhead
# from the metadata. Nicely neither materialize the object.
x <- c(1, 2, 3, 4)
y <- vec_view(x, start = 1, size = 4)
object.size(x)
#> 80 bytes
object.size(y)
#> 80 bytes
lobstr::obj_size(x)
#> 80 B
lobstr::obj_size(y)
#> 776 B
view_is_materialized(y)
#> [1] FALSE

# Small view, 100 elements
base <- sample(1e6)
x <- base[1:100]
y <- vec_view(base, start = 1L, size = 100L)

is_altrep(y)
#> [1] TRUE
view_is_materialized(y)
#> [1] FALSE

# One element
bench::mark(
  x[[25L]],
  y[[25L]],
  iterations = 1000000
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 x[[25L]]          0     41ns 23280892.        0B     69.8
#> 2 y[[25L]]          0     41ns 18932329.        0B     56.8

# A slice
bench::mark(
  x[10:60],
  y[10:60],
  iterations = 1000000
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 x[10:60]      123ns    246ns  3228295.      512B     25.8
#> 2 y[10:60]      287ns    369ns  2216498.      512B     15.5

# Still not materialized!
view_is_materialized(y)
#> [1] FALSE

# Duplication currently forces materialization, I think this is something
# we can fix in base R. They use `INTEGER()` where I think they should use
# `INTEGER_RO()`. Returned object is not ALTREP.
z <- duplicate(y)
view_is_materialized(y)
#> [1] TRUE

is_altrep(z)
#> [1] FALSE

# A slice after materialization - still takes the same amount of time as before materialization
# (in both cases we use a cached readonly pointer to the data)
bench::mark(
  x[10:60],
  y[10:60],
  iterations = 1000000
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 x[10:60]      123ns    246ns  3180695.      512B     22.3
#> 2 y[10:60]      287ns    369ns  2222617.      512B     13.3

# Large view, 1,000,000 elements
base <- sample(1e7)
x <- base[seq(from = 1000, length.out = 1000000)]
y <- vec_view(base, start = 1000, size = 1000000)

# Large slice (still roughly 2x slower, same as with small slice)
bench::mark(
  x[1:500000],
  y[1:500000],
  iterations = 1000
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 x[1:5e+05] 909.54µs   1.17ms      822.    3.81MB     24.5
#> 2 y[1:5e+05]   2.19ms   2.42ms      413.    3.81MB     10.2

# Large slice, random index
# The difference goes away as cache misses tend to dominate instead
idx <- sample(1e7, 1e6)
bench::mark(
  x[idx],
  y[idx],
  iterations = 1000
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 x[idx]       2.19ms    2.4ms      408.    3.81MB    10.5 
#> 2 y[idx]       2.47ms   2.69ms      357.    3.81MB     8.78

# Still not materialized!
view_is_materialized(y)
#> [1] FALSE

r-lib / rlang

Add ALTREP view support #1725