vkoskiv / c-ray

c-ray is a small, simple path tracer written in C
MIT License
797 stars 44 forks source link

Binned SAH BVH builder #71

Closed madmann91 closed 4 years ago

madmann91 commented 4 years ago

This PR implements a Binned SAH BVH builder, based on "On fast Construction of SAH-based Bounding Volume Hierarchies", by I. Wald. The existing Kd-tree/BVH implementation has been kept intact for comparison purposes. It can be enabled again by defining the preprocessor symbol OLD_KD_TREE.

I have only the provided test scene to measure performance, and I cannot really tell if there's a measurable performance improvement because:

Finally, there's a large optimization potential that's not taken advantage of, namely early exit for shadow rays. For that, I have added a TODO that suggests to add a tMin and tMax to your rays, so that you can actually cull nodes more efficiently during traversal (because shadow rays can exit the traversal loop early, upon the first intersection found in the given [tMin, tMax] range). This in general gives a 20% performance boost when implemented correctly. This PR does not implement this feature, as it would break the compatibility with the existing acceleration data structure.

vkoskiv commented 4 years ago

This is so cool! I'm evaluating it right now, I'll post my comments shortly. Looking great at first glance!

vkoskiv commented 4 years ago

Okay, I'm speechless here. I just quickly threw in the INTERPROCEDURAL_OPTIMIZATION flag to my CMakeLists.txt to test it out, and what? The hdr.json scene rendered 60% faster than ever before. I had tried adding -flto to the linker flags before, resulting in a slightly smaller binary, but this is just an incredible boost in performance from such a small change.

I'm still exploring this PR, but I can already tell you you've made my year with this. I am ecstatic!

vkoskiv commented 4 years ago

Okay, I've calmed down a bit and looked through all the code, and I liked everything I saw. This will be merged as-is.

Few remarks:

I recommend to either merge those meshes, or to build a top-level BVH that has per-mesh BVHs as leaves.

I think we should go for a top-level BVH here, feels like the more appropriate solution.

This PR does not implement this feature, as it would break the compatibility with the existing acceleration data structure.

You are more than welcome to implement this, changing the existing data structures as you see fit.

I'd love to know which tools you used for profiling as well. I've used the sampling profiler that comes with Xcode (built on DTrace), but apparently I wasn't that good at interpreting the results, as I had no idea it was as bad as you've shown it to be. 😄 I think I just failed to identify the massive amount of function call overhead.

I think the plan going forward would be something like this:

What an incredible PR, thanks so much for helping out!

vkoskiv commented 4 years ago

@madmann91 If it's okay by you, I would like to tag you in the README as a contributor.

madmann91 commented 4 years ago

I use perf for profiling. It's pretty low-level, but it can perform sampling and can return statistics for cache misses, number of instructions executed, and so on. It's also not intrusive, and has very little overhead, compared to other approaches like gprof. To sample an executable, use perf record <prog> <args> (where prog and args are the program name and its command line arguments, respectively), and to show the result use perf report. To get statistics, use perf stat <prog> <args>. You might want to compile in RelWithDebInfo to get valuable annotations in the perf report, otherwise you'll only see assembly without line info.

Thank you for the attribution request, I'm fine with it. In any case it's going to be visible on GitHub's contributors page.

madmann91 commented 4 years ago

I'd be curious to know what is the performance difference between the new BVH implementation and the old one on more complicated scenes. Do you have some figures? And maybe also, do you have any complicated (ideally only one mesh, but highly tessellated and with a non-uniform primitive distribution, like e.g. Sponza) test scene that I could play around with?

vkoskiv commented 4 years ago

@madmann91 I'll do a lot more testing in the coming days. The most complex scene I have right now is the stormtrooper scene that you see in the readme, but that one also has the meshes separate, and no top-level acceleration structure of course. So I guess that feature is what should be tackled next. You can download the stormtrooper scene here if you want. And if you do end up making changes to scenes, or new scenes, then use the scripts/bundler.py to package them up like that zip in the download link. I'll keep you posted on the performance metrics I measure once I have them.

lycium commented 4 years ago

I'm also blown away by your generosity here, @madmann91! Checked out your bvh project, it looks very impressive :)

madmann91 commented 4 years ago

@lycium Thank you! It's not much really. This project looked interesting, so I thought I'd just contribute a bit.