Much faster ray traversal

bobcao3 commented 2 years ago

Here are the changes:

Implemented hierarchical voxel traversal (similar to the Hi-Z ray marching algorithm). The occupancy LOD tree is stored in a bit field to reduce memory bandwidth
Optimized the ray traversal code, reducing worst case branch divergence (a huge deal for GPUs)
Added ray replacement & process multiple samples within the same kernel. (i.e. if a previous ray has exited, replace it with a new sample instead of letting the thread idle)
Use the voxel space (-N, N) for traversal reference instead of converting from uniform space for better precision. Using a change-of-space on AABB intersection so that the min axis is always 0 (more compiler constant folding possible)

yuanming-hu commented 2 years ago

Nice!! Could you provide some benchmark info regarding the performance boost?

I tried it locally, but find that on example6.py the new renderer creates a different result from previous:

bobcao3 commented 2 years ago

Nice!! Could you provide some benchmark info regarding the performance boost?

I tried it locally, but find that on example6.py the new renderer creates a different result from previous:

ah.. my bad, a float -> u8 encoding overflow

bobcao3 commented 2 years ago

Performance benchmark:

Both original and new has target_fps=30 for adaptive sampling, we are measuring the average time taken for every 1024 samples. We take 15 batches of 1024 samples and average them, the first batch after startup is ditched to reach steady state.

All data taken from RTX3080, Taichi 1.0.2 (master), Windows with Game Ready driver. The voxel challenge window is focused to ensure windows scheduler is prioritizing the task. All data measured from the boot-up viewport (no adjustment)

All units here are seconds. All data reported here have a variance under 3e-4

Case	Original	Optimized
Example 1	0.9469	0.4766
Example 2	0.6311	0.2553
Example 3	3.199	1.464
Example 4	2.832	1.030
Example 5	4.155	0.9474
Example 6	6.662	1.890
Example 7	5.195	1.181
Example 8	9.167	1.846

@yuanming-hu

yuanming-hu commented 2 years ago

Awesome! It may take me a while to read through the code - I may need to implement transparent/metallic-roughness materials to the renderer which needs a few more fields to store the material, so I'd better have a careful understanding of what's going on here (especially the encoding/decoding part).

Meanwhile, feel free to open another repo with your acceleration! Thanks.

bobcao3 commented 2 years ago

Awesome! It may take me a while to read through the code - I may need to implement transparent/metallic-roughness materials to the renderer which needs a few more fields to store the material, so I'd better have a careful understanding of what's going on here (especially the encoding/decoding part).

Meanwhile, feel free to open another repo with your acceleration! Thanks.

Encoding & decoding can be gone just fine, the code path is register limited right now and the main bottleneck is the DDA which doesn't really care about the materials, only voxel occupancy

bobcao3 commented 2 years ago

Update: the code is updated to use a 3d texture for the voxel structure. The occupancy grid is still a manually packed bitfield

k-ye commented 2 years ago

Shall we merge this now?

yuanming-hu commented 2 years ago

I'm wondering whether we merge this here or start a new repo (voxel-challenge-2) and keep the code here simple :-) I prefer the latter. I can also try adding more BSDF models in the new repo.

@bobcao3 Instead of making a PR to the main branch, could you PR to a v2 branch? I'm happy to merge there and put v2 into a new repo.

bobcao3 commented 2 years ago

I'm wondering whether we merge this here or start a new repo (voxel-challenge-2) and keep the code here simple :-) I prefer the latter. I can also try adding more BSDF models in the new repo.

@bobcao3 Instead of making a PR to the main branch, could you PR to a v2 branch? I'm happy to merge there and put v2 into a new repo.

I don't think I can create branches in this repo. Can you create one so I can retarget the PR?

taichi-dev / voxel-challenge

Much faster ray traversal #19