lighttransport / nanort

NanoRT, single header only modern ray tracing kernel.
MIT License
1.07k stars 89 forks source link

Performance descriptions or wiki elaboration #58

Open cadop opened 4 years ago

cadop commented 4 years ago

I looked at the readme and wiki, but I don't think the performance is really covered that much besides some mentions of "efficient ray intersection finding". Would it be possible to elaborate on the performance characteristics of nanort? I found nanort from the issue on Embree not supporting double precision. One of the reasons I was starting with Embree was their paper on the high performance aspect of it, however for the scientific computing side the accuracy is also important.

Are there any benchmarks or even rough expectations for the difference between a single ray intersection with triangles in the BVH from nanort compared to Embree or the other raytracers?

syoyo commented 4 years ago

See https://github.com/lighttransport/nanort/issues/57 for rough estimates on the performance of NanoRT compared to Embree, but it is recommended to measure the performance on your side(and share the result is appreciated)

There is no OSS ray tracing library (except for NanoRT) which supports double precision as far as we know so it would be difficult how performant double-precision NanoRT is(in most case double-precision NanoRT is enoughly fast though).

cadop commented 4 years ago

Thanks, that was what I was looking for. Is the 3-4x slower referring to the double precision calculations? If so it sure seems to be a reasonable expectation compared to embree for double precision.

syoyo commented 4 years ago

@cadop 3-4x is for single precision.

cadop commented 4 years ago

I am still working on some more tests and checking if I can improve the way I implemented nanoRT, but here are my results so far in my own use case (mostly posting for reference for others, but also if the numbers make sense to you or not, should not be taken as a decisive metric). nanoRT using doubles, and obviously embree is floats. Times are only for the raycast loop (timer is started after BVH is created).

Using a model with ~1000 vertices, 40,000 rays cast in a loop (single core):

Using a model with 320,068 vertices, 40,000 rays cast in a loop (single core):

Same model as above, but with 360,000 rays. (I expanded the grid to cast rays, so there are some more that are possibly missing/hitting than in the previous case)

Using a model with ~1,000,000 vertices, 40,000 rays cast in a loop (single core):

So assuming I haven't messed up in my integration, it seems like the model size, is having a really big impact on performance compared to just increasing the number of rays. Would this suggest the performance difference is really about the BVH efficiency more than the double precision?

Settings:

Windows 10, Visual Studio 2019 MSVC x64 release Intel Xeon CPU E5-2630 v4 Enabled C++11 features in define for nanoRT Only using obj, not storing/using .mtl

syoyo commented 4 years ago

One of the performance impact would be memory bandwidth. Embree further uses quantized bounding box for BVH(at least curves primitives does) to reduce memory impact.

NanoRT always use double precision for BVH when using double as a template parameter. This may be overkill for normal usecase, but may be beneficial on HPC applications(e.g. CAD, astronomy)

Also, there is a room for efficient BVH build in NanoRT, especially implementing spatial split BVH: https://github.com/lighttransport/nanort/issues/15