Open ib00 opened 1 year ago
On GPU I use fairly simple two-level binary BVH traversal with stack siting in shared memory. I tested a stackless approach some time ago, but ended up with this after all. You can find the source code in intersect_scene.comp.glsl (functions are Traverse_BLAS_WithStack and Traverse_TLAS_WithStack). I wanted to try "Compressed Wide BVHs" some time ago, but it seems you can rely on hardware raytracing these days as it is supported by all recent GPUs. By enabling HWRT on RTX3080 I get about 4x overall speedup. The difference is less pronounced on AMD. You can try it yourself with the sample application: https://github.com/sergcpp/RayDemo/releases by adding "--nohwrt" argument to disable hardware raytracing.
On CPU it is a little bit more complicated. I use the idea from "Shallow Bounding Volume Hierarchies for Fast SIMD Ray Tracing of Incoherent Rays". Binary BVH gets flattened into 8-children one. During traversal single ray is tested against 8 bboxes (using SSE/AVX), which should be better for incoherent rays than packet traversal. But it is still quite ineffective and I plan to improve it.
Thanks! Very cool project.
So, for GPU (explicit compute shader, not HW), you would try this: https://research.nvidia.com/publication/2017-07_efficient-incoherent-ray-traversal-gpus-through-compressed-wide-bvhs
I think https://github.com/pablode/gatling had CWBVH, but he moved to HW intersection. It would be interesting to see how far a hand-written BVH traversal can be pushed.
What algorithm (paper) do you use for BVH traversal?
How does performance of compute traversal compare to hardware (Vulkan) traversal?