This is an alternative approach for optimizing r_lgl_sum() and r_lgl_which() to what is done in #1487.
The way I optimize here is by avoiding branching, like what I did in vec_pany() and vec_pall(). I think I like this better than having magic numbers like in #1487.
The main reason I like this approach is that it isn't "jumpy", so the compiler can't get screwed by inaccurate branch predictions. This mainly shows up in the timings in the 50/50 splits between TRUE/FALSE, where this is drastically faster most of the time.
The downside of this approach is that because it doesn't rely on branch prediction as much, cases like 1% TRUE can be a little slower, but not much, and I think the tradeoff of losing a little performance here vs the drastic gains elsewhere make this worth it. (The one exception seems to always be the "all FALSE" case where this new approach is super fast for some reason).
I was able to run the benchmark from #1487 below. Notice how:
With the results from this PR, the timings are "flattened out" and are all fairly tight around the same range of 150-200ms no matter how the data is skewed. That is a nice benefit of avoiding branch prediction
In particular notice true_50, which is much faster in this PR. The existing implementation and #1487 suffer performance loss from mispredictions
I can't explain why all_false is so fast in this PR but I'm not going to complain about it
In the <details> section I've also run a lot of extra benchmarks for various cases (with and without NA, with and without names, na_propagate = true/false) and I'm still convinced this approach is worth it.
Notably, I tried running the suite of benchmarks with #1487 and it crashed R so it is possible there is a bug there, but I haven't looked into it.
I checked coverage of that file and all lines of r_lgl_sum() and r_lgl_which() were already being hit by tests, so that was good. But I've added some more edge case ones anyways!
Closes https://github.com/r-lib/rlang/pull/1487
This is an alternative approach for optimizing
r_lgl_sum()
andr_lgl_which()
to what is done in #1487.The way I optimize here is by avoiding branching, like what I did in
vec_pany()
andvec_pall()
. I think I like this better than having magic numbers like in #1487.The main reason I like this approach is that it isn't "jumpy", so the compiler can't get screwed by inaccurate branch predictions. This mainly shows up in the timings in the 50/50 splits between TRUE/FALSE, where this is drastically faster most of the time.
The downside of this approach is that because it doesn't rely on branch prediction as much, cases like 1%
TRUE
can be a little slower, but not much, and I think the tradeoff of losing a little performance here vs the drastic gains elsewhere make this worth it. (The one exception seems to always be the "allFALSE
" case where this new approach is super fast for some reason).I was able to run the benchmark from #1487 below. Notice how:
true_50
, which is much faster in this PR. The existing implementation and #1487 suffer performance loss from mispredictionsall_false
is so fast in this PR but I'm not going to complain about itIn the
<details>
section I've also run a lot of extra benchmarks for various cases (with and withoutNA
, with and without names,na_propagate = true/false
) and I'm still convinced this approach is worth it.Notably, I tried running the suite of benchmarks with #1487 and it crashed R so it is possible there is a bug there, but I haven't looked into it.