r-lib / cpp11

cpp11 helps you to interact with R objects using C++ code.
https://cpp11.r-lib.org/
Other
199 stars 46 forks source link

Consider viability of relying on R's global string pool for CHARSXP protection in `r_string` #301

Closed DavisVaughan closed 1 year ago

DavisVaughan commented 1 year ago

Right now, r_string has a private data member called sexp data_, which holds the CHARSXP it wraps https://github.com/r-lib/cpp11/blob/2ddc12f6ab268542c88f4aa24a500682ed260b72/inst/include/cpp11/r_string.hpp#L50

I propose that we consider changing that from sexp to SEXP. The benefit of sexp is that it provides automatic protection of that object, but it doesn't need it! CHARSXPs stored in R's global string pool are automatically protected for the life of the R session, so protecting them again is extraneous, and more importantly, pretty expensive.

There are two obvious and extremely common cases that would have significant performance improvements if we did this:

The change would involve moving from sexp to SEXP here: https://github.com/r-lib/cpp11/blob/2ddc12f6ab268542c88f4aa24a500682ed260b72/inst/include/cpp11/r_string.hpp#L50

And then making the == comparison operators compare data_ rather than data_.data() here: https://github.com/r-lib/cpp11/blob/2ddc12f6ab268542c88f4aa24a500682ed260b72/inst/include/cpp11/r_string.hpp#L35-L37

These changes in combination with #299 would have made cpp11 fast enough that I wouldn't have needed to drop down to the R API here https://github.com/EmilHvitfeldt/cpp11ngram/pull/1/files#diff-aa381cabaf80f28b6f14233769693f3faf2d817395969f33e64e014284a51130R20-R33

Performance examples follow below


For element extraction, let's look at using the raw R API to see how fast it could be:

set.seed(123)
x <- sample(letters, 1e6, replace = TRUE)

cpp11::cpp_function("

cpp11::sexp test_extract_r_api(cpp11::strings x) {
  const R_xlen_t size = x.size();

  for (R_xlen_t i = 0; i < size; ++i) {
    SEXP elt = STRING_ELT(x, i);
  }

  return R_NilValue;
}

")

bench::mark(test_extract_r_api(x))
#> # A tibble: 1 × 6
#>   expression                 min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>            <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 test_extract_r_api(x)   1.51ms   1.93ms      524.        0B        0

Then with cpp11:

set.seed(123)
x <- sample(letters, 1e6, replace = TRUE)

cpp11::cpp_function("

cpp11::sexp test_extract_cpp11(cpp11::strings x) {
  const R_xlen_t size = x.size();

  for (R_xlen_t i = 0; i < size; ++i) {
    cpp11::r_string elt = x[i];
  }

  return R_NilValue;
}

")

bench::mark(test_extract_cpp11(x))
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 × 6
#>   expression                 min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>            <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 test_extract_cpp11(x)   60.2ms   63.5ms      15.3        0B     23.0

This is slow because x[i] does STRING_ELT() to get the SEXP, but then casts that to r_string, which wraps the SEXP into a sexp, forcing it to be protected.

Now that example again, but with sexp swapped for SEXP, avoiding the extra protection:

bench::mark(test_extract_cpp11(x))
#> # A tibble: 1 × 6
#>   expression                 min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>            <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 test_extract_cpp11(x)   1.43ms   1.55ms      595.        0B        0

Just as fast as the raw R API!


Now let's look at element assignment. We are going to assign the string "foo" 1 million times to an output vector.

With the raw R API:

set.seed(123)
x <- sample(letters, 1e6, replace = TRUE)

cpp11::cpp_function("

cpp11::writable::strings
test_std_string_assign_r_api() {
  cpp11::writable::strings out(1000000);
  const R_xlen_t size = out.size();

  const std::string x = \"foo\";

  for (R_xlen_t i = 0; i < size; ++i) {
    SET_STRING_ELT(out, i, Rf_mkCharLenCE(x.data(), x.size(), CE_UTF8));
  }

  return out;
}

")

bench::mark(test_std_string_assign_r_api())
#> # A tibble: 1 × 6
#>   expression                          min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 test_std_string_assign_r_api()     24ms   26.8ms      35.6    7.63MB     7.63

Now with a naive cpp11 approach:

set.seed(123)
x <- sample(letters, 1e6, replace = TRUE)

cpp11::cpp_function("

cpp11::writable::strings
test_std_string_assign_naive() {
  cpp11::writable::strings out(1000000);
  const R_xlen_t size = out.size();

  const std::string x = \"foo\";

  for (R_xlen_t i = 0; i < size; ++i) {
    out[i] = x;
  }

  return out;
}

")

bench::mark(test_std_string_assign_naive())
#> # A tibble: 1 × 6
#>   expression                          min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 test_std_string_assign_naive()    1.38s    1.38s     0.723    7.63MB        0

Wow, painfully slow. Now, the biggest issue here is actually that unwind_protect() is running twice at each loop iteration:

So let's assume we know we can wrap the loop in unwind_protect() to limit the number of unwind-protects, AND let's assume we have #299 installed so that technique actually works.

set.seed(123)
x <- sample(letters, 1e6, replace = TRUE)

cpp11::cpp_function("

cpp11::writable::strings
test_std_string_assign_cpp11() {
  cpp11::writable::strings out(1000000);
  const R_xlen_t size = out.size();

  const std::string x = \"foo\";

  cpp11::unwind_protect([&] {
    for (R_xlen_t i = 0; i < size; ++i) {
      out[i] = x;
    }
  });

  return out;
}

")

bench::mark(test_std_string_assign_cpp11())
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 × 6
#>   expression                          min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 test_std_string_assign_cpp11()     92ms    102ms      9.66    7.63MB     15.5

Better, but still 4x slower than base R.

Again, the problem here is that x goes from std::string to CHARSXP to r_string, and when it gets put in the r_string it gets wrapped in sexp and is "extra" protected.

Now that unwind_protect() example again, but with sexp swapped for SEXP, avoiding the extra protection:

bench::mark(test_std_string_assign_cpp11())
#> # A tibble: 1 × 6
#>   expression                          min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 test_std_string_assign_cpp11()   29.1ms   31.1ms      32.0    7.63MB     7.99

Again, essentially as fast as the R API!

DavisVaughan commented 1 year ago

I think Lionel and Kevin convinced me that R doesn't actually protect the global string pool (because that could lead to strings never being released), so this won't work. It does protect symbols, like Rf_install()