running on my GTX 980 Ti, the difference with and without R3D intersection is 1 second versus 6 seconds total for the warp_test, and adding R3D integration raises total time to 10 seconds. I'll just note this as an issue because its not yet top priority, but at some point I should try to speed that up. Note that in serial on the CPU the warp_test runs in less than 2 seconds.
I fixed this (although kindof in a rush) in the commits merged at 3c74110. Interestingly, the compiler was not optimizing out pass-by-value of the Polytope class, so switching to references was actually what I did.
running on my GTX 980 Ti, the difference with and without R3D intersection is 1 second versus 6 seconds total for the warp_test, and adding R3D integration raises total time to 10 seconds. I'll just note this as an issue because its not yet top priority, but at some point I should try to speed that up. Note that in serial on the CPU the warp_test runs in less than 2 seconds.