An idea to improve coherence is to split up rendering into several kernels.
Have a generateCameraRays kernel that takes a pointer to a camera, and populates a thrust::device_vector with rays. Then, have a sampleRay kernel that takes the vector of rays and computes the color at each ray in the case of direct lighting otherwise modifies the mask at that pixel index before passing along the ray for the next call of sampleRay.
The overhead of transferring memory between host and device in the case of generating camera rays is actually inefficient and causes a slowdown in performance.
An idea to improve coherence is to split up rendering into several kernels.
Have a
generateCameraRays
kernel that takes a pointer to a camera, and populates athrust::device_vector
with rays. Then, have asampleRay
kernel that takes the vector of rays and computes the color at each ray in the case of direct lighting otherwise modifies the mask at that pixel index before passing along the ray for the next call ofsampleRay
.