Optimize `r_lgl_sum()` and `r_lgl_which()`

Closes https://github.com/r-lib/rlang/pull/1487

This is an alternative approach for optimizing r_lgl_sum() and r_lgl_which() to what is done in #1487.

The way I optimize here is by avoiding branching, like what I did in vec_pany() and vec_pall(). I think I like this better than having magic numbers like in #1487.

The main reason I like this approach is that it isn't "jumpy", so the compiler can't get screwed by inaccurate branch predictions. This mainly shows up in the timings in the 50/50 splits between TRUE/FALSE, where this is drastically faster most of the time.

The downside of this approach is that because it doesn't rely on branch prediction as much, cases like 1% TRUE can be a little slower, but not much, and I think the tradeoff of losing a little performance here vs the drastic gains elsewhere make this worth it. (The one exception seems to always be the "all FALSE" case where this new approach is super fast for some reason).

I was able to run the benchmark from #1487 below. Notice how:

With the results from this PR, the timings are "flattened out" and are all fairly tight around the same range of 150-200ms no matter how the data is skewed. That is a nice benefit of avoiding branch prediction
In particular notice true_50, which is much faster in this PR. The existing implementation and #1487 suffer performance loss from mispredictions
I can't explain why all_false is so fast in this PR but I'm not going to complain about it

In the <details> section I've also run a lot of extra benchmarks for various cases (with and without NA, with and without names, na_propagate = true/false) and I'm still convinced this approach is worth it.

Notably, I tried running the suite of benchmarks with #1487 and it crashed R so it is possible there is a bug there, but I haven't looked into it.

set.seed(1)

n <- 100e6
all_true <- vctrs::vec_rep(TRUE, n)
all_false <- vctrs::vec_rep(FALSE, n)
true_90 <- runif(n) < 0.9
true_50 <- runif(n) < 0.5
true_10 <- runif(n) < 0.1
true_1 <- runif(n) < 0.01

bench::mark(
  all_true = r_lgl_which(all_true, FALSE),
  all_false = r_lgl_which(all_false, FALSE),
  true_90 = r_lgl_which(true_90, FALSE),
  true_50 = r_lgl_which(true_50, FALSE),
  true_10 = r_lgl_which(true_10, FALSE),
  true_1 = r_lgl_which(true_1, FALSE),
  check = FALSE,
  min_iterations = 10
)

# Dev rlang
#> # A tibble: 6 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 all_true      134ms    138ms      5.73  381.47MB    3.82 
#> 2 all_false    93.6ms     94ms     10.6         0B    0    
#> 3 true_90     190.4ms    194ms      4.83  343.32MB    1.21 
#> 4 true_50     387.6ms    448ms      2.32  190.73MB    0.579
#> 5 true_10     173.6ms    187ms      5.38   38.14MB    0    
#> 6 true_1       99.3ms    103ms      9.64    3.82MB    0

# This PR
#> # A tibble: 6 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 all_true    156.3ms  166.9ms      4.96  381.47MB     3.31
#> 2 all_false    33.5ms   36.4ms     26.3         0B     0   
#> 3 true_90     154.1ms  159.6ms      5.74  343.32MB     1.44
#> 4 true_50       148ms  220.3ms      5.08  190.73MB     1.27
#> 5 true_10     146.2ms    163ms      6.16   38.14MB     0   
#> 6 true_1      146.5ms  159.9ms      6.28    3.82MB     0

# From #1487
#> # A tibble: 6 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 all_true    143.7ms  173.3ms      4.87  381.47MB    2.09 
#> 2 all_false    53.4ms   57.6ms     17.7     1.91MB    0    
#> 3 true_90     212.8ms  218.1ms      4.43  381.47MB    1.90 
#> 4 true_50     469.3ms  479.8ms      2.02  199.13MB    0.504
#> 5 true_10     158.9ms  171.7ms      5.73   54.93MB    0.637
#> 6 true_1       79.4ms   80.3ms     12.3     9.13MB    0

```r set.seed(123) sampler <- function(prob) { size <- 100e6 x <- c(TRUE, FALSE, NA) sample(x, size = size, replace = TRUE, prob = prob) } names <- sample(letters, size = 100e6, replace = TRUE) ``` ```r true_100 <- sampler(c(1, 0, 0)) true_90 <- sampler(c(.9, .1, 0)) true_75 <- sampler(c(.75, .25, 0)) true_50 <- sampler(c(.50, .50, 0)) true_25 <- sampler(c(.25, .75, 0)) true_10 <- sampler(c(.10, .90, 0)) true_1 <- sampler(c(.01, .99, 0)) true_0 <- sampler(c(0, 1, 0)) # No NA, no names, `na_propagate = FALSE` bench::mark( true_100 = r_lgl_which(true_100, FALSE), true_90 = r_lgl_which(true_90, FALSE), true_75 = r_lgl_which(true_75, FALSE), true_50 = r_lgl_which(true_50, FALSE), true_25 = r_lgl_which(true_25, FALSE), true_10 = r_lgl_which(true_10, FALSE), true_1 = r_lgl_which(true_1, FALSE), true_0 = r_lgl_which(true_0, FALSE), check = FALSE, min_iterations = 10 ) # Dev rlang (with r_lgl_sum improvement from here) #> # A tibble: 8 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 true_100 129.9ms 146ms 5.17 381.47MB 2.22 #> 2 true_90 194.1ms 204ms 4.86 343.33MB 1.21 #> 3 true_75 315.6ms 331ms 2.94 286.1MB 0.735 #> 4 true_50 384.9ms 448ms 2.17 190.72MB 0.241 #> 5 true_25 288.4ms 303ms 3.16 95.36MB 0.351 #> 6 true_10 158.4ms 199ms 5.14 38.15MB 0 #> 7 true_1 118.6ms 122ms 8.14 3.81MB 0 #> 8 true_0 99.5ms 109ms 9.21 0B 0 # This PR #> # A tibble: 8 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 true_100 137.2ms 139.3ms 6.99 381.47MB 3.00 #> 2 true_90 135.1ms 141.9ms 6.97 343.33MB 1.74 #> 3 true_75 131.4ms 144.3ms 6.89 286.1MB 1.72 #> 4 true_50 146.3ms 153.4ms 6.55 190.72MB 1.64 #> 5 true_25 127.5ms 130.3ms 7.56 95.36MB 0 #> 6 true_10 125.6ms 126ms 7.81 38.15MB 0.868 #> 7 true_1 128.8ms 141.2ms 7.17 3.81MB 0 #> 8 true_0 37.5ms 38.8ms 25.8 0B 0 ``` ```r # No NA, no names, `na_propagate = TRUE` bench::mark( true_100 = r_lgl_which(true_100, TRUE), true_90 = r_lgl_which(true_90, TRUE), true_75 = r_lgl_which(true_75, TRUE), true_50 = r_lgl_which(true_50, TRUE), true_25 = r_lgl_which(true_25, TRUE), true_10 = r_lgl_which(true_10, TRUE), true_1 = r_lgl_which(true_1, TRUE), true_0 = r_lgl_which(true_0, TRUE), check = FALSE, min_iterations = 10 ) # Dev rlang (with r_lgl_sum improvement from here) #> # A tibble: 8 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 true_100 139ms 144ms 6.94 381.47MB 2.97 #> 2 true_90 229.2ms 231.3ms 4.26 343.33MB 1.06 #> 3 true_75 332.8ms 363ms 2.79 286.1MB 0.697 #> 4 true_50 394.4ms 419.6ms 2.38 190.72MB 0.596 #> 5 true_25 273.2ms 301.5ms 3.30 95.36MB 0 #> 6 true_10 176.9ms 191.8ms 5.19 38.15MB 0.577 #> 7 true_1 94ms 98.6ms 10.0 3.81MB 0 #> 8 true_0 86.5ms 87.4ms 11.3 0B 0 # This PR #> # A tibble: 8 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 true_100 155.2ms 156.8ms 6.23 381.47MB 2.67 #> 2 true_90 154.3ms 173.1ms 5.82 343.33MB 1.45 #> 3 true_75 152.9ms 170.6ms 5.92 286.1MB 1.48 #> 4 true_50 148.9ms 156.8ms 6.34 190.72MB 1.59 #> 5 true_25 145.4ms 147ms 6.75 95.36MB 0 #> 6 true_10 142.1ms 146.1ms 6.65 38.15MB 0.739 #> 7 true_1 141.4ms 143.6ms 6.94 3.81MB 0 #> 8 true_0 35.3ms 39.2ms 25.6 0B 0 ``` ```r true_100 <- set_names(true_100, names) true_90 <- set_names(true_90, names) true_75 <- set_names(true_75, names) true_50 <- set_names(true_50, names) true_25 <- set_names(true_25, names) true_10 <- set_names(true_10, names) true_1 <- set_names(true_1, names) true_0 <- set_names(true_0, names) # No NA, with names, `na_propagate = FALSE` bench::mark( true_100 = r_lgl_which(true_100, FALSE), true_90 = r_lgl_which(true_90, FALSE), true_75 = r_lgl_which(true_75, FALSE), true_50 = r_lgl_which(true_50, FALSE), true_25 = r_lgl_which(true_25, FALSE), true_10 = r_lgl_which(true_10, FALSE), true_1 = r_lgl_which(true_1, FALSE), true_0 = r_lgl_which(true_0, FALSE), check = FALSE, min_iterations = 10 ) # Dev rlang (with r_lgl_sum improvement from here) #> # A tibble: 8 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 true_100 932ms 1.01s 1.00 1.49GB 1.00 #> 2 true_90 983ms 984.18ms 1.01 1.38GB 1.01 #> 3 true_75 997ms 1.03s 0.964 1.21GB 0.413 #> 4 true_50 793ms 813.26ms 1.23 953.65MB 0.526 #> 5 true_25 479ms 486.21ms 2.02 667.55MB 0.225 #> 6 true_10 281ms 315.4ms 3.20 495.91MB 0 #> 7 true_1 169ms 176.56ms 5.69 392.89MB 0.632 #> 8 true_0 101ms 103.7ms 9.58 381.47MB 0 # This PR #> # A tibble: 8 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 true_100 982.7ms 992.6ms 1.00 1.12GB 1.00 #> 2 true_90 869.5ms 931.3ms 1.08 1.01GB 1.08 #> 3 true_75 748.6ms 797.3ms 1.24 858.29MB 0.824 #> 4 true_50 450.5ms 474.2ms 2.06 572.18MB 0.229 #> 5 true_25 307.2ms 324.2ms 3.04 286.08MB 0.338 #> 6 true_10 229.4ms 246.9ms 4.08 114.44MB 0 #> 7 true_1 151.4ms 161.6ms 6.20 11.43MB 0 #> 8 true_0 36.6ms 38.7ms 25.4 0B 0 ``` ```r # No NA, with names, `na_propagate = TRUE` bench::mark( true_100 = r_lgl_which(true_100, TRUE), true_90 = r_lgl_which(true_90, TRUE), true_75 = r_lgl_which(true_75, TRUE), true_50 = r_lgl_which(true_50, TRUE), true_25 = r_lgl_which(true_25, TRUE), true_10 = r_lgl_which(true_10, TRUE), true_1 = r_lgl_which(true_1, TRUE), true_0 = r_lgl_which(true_0, TRUE), check = FALSE, min_iterations = 10 ) # Dev rlang (with r_lgl_sum improvement from here) #> # A tibble: 8 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 true_100 902.2ms 910.91ms 1.08 1.12GB 1.08 #> 2 true_90 953.7ms 969.27ms 0.996 1.01GB 0.996 #> 3 true_75 999ms 1.03s 0.973 858.29MB 0.649 #> 4 true_50 787.1ms 804.18ms 1.24 572.18MB 0.138 #> 5 true_25 478.3ms 492.15ms 2.02 286.08MB 0.225 #> 6 true_10 276.3ms 281.21ms 3.56 114.44MB 0.396 #> 7 true_1 156.4ms 162.44ms 6.15 11.43MB 0 #> 8 true_0 88.2ms 90.02ms 11.1 0B 0 # This PR #> # A tibble: 8 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 true_100 967.3ms 976.8ms 1.02 1.12GB 1.02 #> 2 true_90 898.4ms 909.8ms 1.09 1.01GB 0.729 #> 3 true_75 786.9ms 804.9ms 1.24 858.29MB 1.24 #> 4 true_50 468.3ms 511.3ms 1.93 572.18MB 0.483 #> 5 true_25 343.4ms 374.5ms 2.69 286.08MB 0.298 #> 6 true_10 262.5ms 283.7ms 3.55 114.44MB 0 #> 7 true_1 181.2ms 185.1ms 5.36 11.43MB 0.596 #> 8 true_0 39.9ms 40.6ms 23.9 0B 0 ``` ```r true_90_false_0 <- sampler(c(.90, .0, .10)) true_75_false_10 <- sampler(c(.75, .10, .15)) true_50_false_20 <- sampler(c(.50, .20, .30)) true_25_false_30 <- sampler(c(.25, .30, .45)) true_10_false_60 <- sampler(c(.10, .60, .30)) true_1_false_1 <- sampler(c(.01, .01, .98)) # Varying na, no names, `na_propagate = FALSE` bench::mark( true_90_false_0 = r_lgl_which(true_90_false_0, FALSE), true_75_false_10 = r_lgl_which(true_75_false_10, FALSE), true_50_false_20 = r_lgl_which(true_50_false_20, FALSE), true_25_false_30 = r_lgl_which(true_25_false_30, FALSE), true_10_false_60 = r_lgl_which(true_10_false_60, FALSE), true_1_false_1 = r_lgl_which(true_1_false_1, FALSE), check = FALSE, min_iterations = 10 ) # Dev rlang (with r_lgl_sum improvement from here) #> # A tibble: 6 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 true_90_false_0 187ms 331ms 3.44 343.33MB 0.382 #> 2 true_75_false_10 293ms 334ms 3.03 286.11MB 0.336 #> 3 true_50_false_20 381ms 407ms 2.48 190.76MB 0.275 #> 4 true_25_false_30 248ms 263ms 3.77 95.37MB 0 #> 5 true_10_false_60 155ms 161ms 6.22 38.14MB 0.692 #> 6 true_1_false_1 106ms 113ms 8.86 3.82MB 0 # This PR #> # A tibble: 6 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 true_90_false_0 137ms 145ms 6.68 343.33MB 0.742 #> 2 true_75_false_10 142ms 148ms 6.77 286.11MB 0.753 #> 3 true_50_false_20 132ms 149ms 6.60 190.76MB 0.733 #> 4 true_25_false_30 129ms 135ms 7.38 95.37MB 0 #> 5 true_10_false_60 126ms 132ms 7.56 38.14MB 0 #> 6 true_1_false_1 128ms 132ms 7.48 3.82MB 0 ``` ```r # Varying na, no names, `na_propagate = TRUE` bench::mark( true_90_false_0 = r_lgl_which(true_90_false_0, TRUE), true_75_false_10 = r_lgl_which(true_75_false_10, TRUE), true_50_false_20 = r_lgl_which(true_50_false_20, TRUE), true_25_false_30 = r_lgl_which(true_25_false_30, TRUE), true_10_false_60 = r_lgl_which(true_10_false_60, TRUE), true_1_false_1 = r_lgl_which(true_1_false_1, TRUE), check = FALSE, min_iterations = 10 ) # Dev rlang (with r_lgl_sum improvement from here) #> # A tibble: 6 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 true_90_false_0 234ms 292ms 2.95 381MB 0.328 #> 2 true_75_false_10 316ms 325ms 3.03 343MB 0.756 #> 3 true_50_false_20 444ms 460ms 2.15 305MB 0.239 #> 4 true_25_false_30 478ms 507ms 1.99 267MB 0.221 #> 5 true_10_false_60 434ms 460ms 2.16 153MB 0 #> 6 true_1_false_1 152ms 167ms 5.54 378MB 1.38 # This PR #> # A tibble: 6 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 true_90_false_0 216ms 241ms 4.19 381MB 1.05 #> 2 true_75_false_10 250ms 288ms 3.13 343MB 0.347 #> 3 true_50_false_20 349ms 403ms 2.29 305MB 0.254 #> 4 true_25_false_30 455ms 483ms 1.94 267MB 0.216 #> 5 true_10_false_60 406ms 437ms 2.29 153MB 0.255 #> 6 true_1_false_1 181ms 301ms 3.64 378MB 0.404 ``` ```r true_90_false_0 <- set_names(true_90_false_0, names) true_75_false_10 <- set_names(true_75_false_10, names) true_50_false_20 <- set_names(true_50_false_20, names) true_25_false_30 <- set_names(true_25_false_30, names) true_10_false_60 <- set_names(true_10_false_60, names) true_1_false_1 <- set_names(true_1_false_1, names) # Varying na, with names, `na_propagate = FALSE` bench::mark( true_90_false_0 = r_lgl_which(true_90_false_0, FALSE), true_75_false_10 = r_lgl_which(true_75_false_10, FALSE), true_50_false_20 = r_lgl_which(true_50_false_20, FALSE), true_25_false_30 = r_lgl_which(true_25_false_30, FALSE), true_10_false_60 = r_lgl_which(true_10_false_60, FALSE), true_1_false_1 = r_lgl_which(true_1_false_1, FALSE), check = FALSE, min_iterations = 10 ) # Dev rlang (with r_lgl_sum improvement from here) #> # A tibble: 6 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 true_90_false_0 947ms 1.18s 0.872 1.38GB 0.374 #> 2 true_75_false_10 980ms 1.04s 0.916 1.21GB 0.229 #> 3 true_50_false_20 807ms 917.07ms 1.10 953.75MB 0.276 #> 4 true_25_false_30 472ms 559.36ms 1.78 667.59MB 0 #> 5 true_10_false_60 327ms 373.92ms 2.74 495.88MB 0.304 #> 6 true_1_false_1 178ms 182.67ms 5.47 392.93MB 0 # This PR #> # A tibble: 6 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 true_90_false_0 867ms 960ms 0.987 1.01GB 0.247 #> 2 true_75_false_10 805ms 915ms 1.04 858.32MB 0.260 #> 3 true_50_false_20 492ms 613ms 1.70 572.28MB 0.426 #> 4 true_25_false_30 328ms 367ms 2.58 286.12MB 0.286 #> 5 true_10_false_60 235ms 282ms 3.64 114.41MB 0.405 #> 6 true_1_false_1 162ms 166ms 5.97 11.46MB 0 ``` ```r # Varying na, with names, `na_propagate = TRUE` bench::mark( true_90_false_0 = r_lgl_which(true_90_false_0, TRUE), true_75_false_10 = r_lgl_which(true_75_false_10, TRUE), true_50_false_20 = r_lgl_which(true_50_false_20, TRUE), true_25_false_30 = r_lgl_which(true_25_false_30, TRUE), true_10_false_60 = r_lgl_which(true_10_false_60, TRUE), true_1_false_1 = r_lgl_which(true_1_false_1, TRUE), check = FALSE, min_iterations = 10 ) # Dev rlang (with r_lgl_sum improvement from here) #> # A tibble: 6 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 true_90_false_0 993.47ms 1.04s 0.888 1.12GB 0.381 #> 2 true_75_false_10 1.08s 1.11s 0.891 1.01GB 0.382 #> 3 true_50_false_20 1.2s 1.23s 0.802 915.53MB 0.344 #> 4 true_25_false_30 1.17s 1.19s 0.836 801.16MB 0.209 #> 5 true_10_false_60 770.68ms 811.73ms 1.24 457.74MB 0.137 #> 6 true_1_false_1 894.79ms 916ms 1.09 1.11GB 0.469 # This PR #> # A tibble: 6 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> #> 1 true_90_false_0 1.07s 1.09s 0.918 1.12GB 0.612 #> 2 true_75_false_10 1.03s 1.05s 0.941 1.01GB 0.235 #> 3 true_50_false_20 1.09s 1.13s 0.893 915.53MB 0.383 #> 4 true_25_false_30 980.22ms 1.01s 0.993 801.16MB 0.248 #> 5 true_10_false_60 505.96ms 571.14ms 1.70 457.74MB 0.424 #> 6 true_1_false_1 1.03s 1.12s 0.901 1.11GB 0.386 ```

r-lib / rlang

Optimize `r_lgl_sum()` and `r_lgl_which()` #1577