Hi, Thanks for sharing the excellent works.
I have carefully read the code, and I find in the cuda implementation of render_ray_kernel function, you calculate a single ray with a warp of kernels, each kernel caculates a single channel, as I understand. I wonder the benifits of this way, as I find that many repeated computations are excuted for the SH method, the only parallism is the add of multiple coefficient, as I understand. I wonder if this is the best way you have tried for accelerating computations for SH representations. And why this way is better.
Thanks you.
Hi, Thanks for sharing the excellent works. I have carefully read the code, and I find in the cuda implementation of render_ray_kernel function, you calculate a single ray with a warp of kernels, each kernel caculates a single channel, as I understand. I wonder the benifits of this way, as I find that many repeated computations are excuted for the SH method, the only parallism is the add of multiple coefficient, as I understand. I wonder if this is the best way you have tried for accelerating computations for SH representations. And why this way is better. Thanks you.