Closed garethwalkom closed 4 years ago
I ran very similar tests but on a different (custom-designed) scene and observed something very similar. I compared the scalar_rgb
and the gpu_autodiff_rgb
modes for rendering the scene with different values of sample_count
. The results are summarized in the image below:
sample_count
valuesample_count
which I expected to be much smaller for GPU rendering. samples_per_pass
more than 50. Is this expected? What's the best way to get the maximum computational acceleration?
@abhinavvs, @garethwalkom -- we're aware of this. Enoki's JIT compiler is currently undergoing a complete redesign to improve performance on the GPU, amongst other things. We're optimistic that this will lead to significant speedups.
That said, if your goal is to have a very fast GPU path tracer, let me stress that you absolutely should not use Mitsuba 2. It uses a wavefront approach, where we read and write out large amounts of information into memory at each bounce. This memory traffic becomes the main bottleneck rather than, e.g., rays per second. The fastest current GPU path tracers instead use a megakernel approach, which avoids this heavy memory traffic. Obvious follow-up questoin: why did we not use a megakernel approach for Mitsuba? Mitsuba's GPU mode is primarily designed to enable differentiable rendering, which requires a wavefront simulation that allows us to record a complete graph of the underlying computation.
Thanks for the insightful explanation, @wjakob! This explains a lot.
Personally, I would still stick with Mitsuba 2 since I am looking to perform differentiable rendering. The speed-up question was more of a 'good-to-know' type thing.
That said, I am eagerly looking forward to the Enoki JIT compiler redesign changes that you had mentioned. I am also looking forward to the plug-in in PR #44 which allows differentiation w.r.t object transformations. Both those upgrades would be very useful for my research. Could you comment on the rough expected release dates for these plug-ins/functionalities?
Thanks @wjakob! It was also something I was curious to know.
I've been doing some test renders with the cbox.xml example scene with different spectral variables, to see which is the fastest; however, I expected GPU rendering to be much faster. Do these times look correct? Should I change something to improve rendering speed? I also didn't expect
gpu_spectral
the slowest.All of the below are rendered at 256x256 at 256
samples_count
scalar_spectral
took: 12.578 spacket_spectral
took: 12.993 sgpu_spectral
at 64samples_per_pass
took: 13.119 sgpu_autodiff_spectral
at 64samples_per_pass
took: 10.364 sIs there maybe another or a better way I should be testing this?