Closed nical closed 4 months ago
The speedup for contains_box
:
Before:
box2d contains_box f32 time: [436.69 ms 436.97 ms 437.26 ms]
After:
box2d contains_box f32 time: [244.31 ms 244.51 ms 244.72 ms]
change: [-44.102% -44.044% -43.981%] (p = 0.00 < 0.05)
After stumbling upon https://www.romainguy.dev/posts/2024/down-a-rabbit-hole/ which applies the same optimization to equivalent Kotlin code I gave it a try in euclid and it indeed makes for quite a nice speedup:
Chaining simple && conditions produces branch instructions in some cases. Replacing the logical
&&
with bitwise&
to ensure the compiler does not produce branches makes a big (70~80%) performance difference.Generated assembly
intersects
before:intersects
after:contains
before:contains
after:Benchmark code
Results in code comments.