GPU Rendering not that much faster

mitsuba-renderer / mitsuba2

Mitsuba 2: A Retargetable Forward and Inverse Renderer

Other

2.05k stars 267 forks source link

GPU Rendering not that much faster #72

Closed garethwalkom closed 4 years ago

garethwalkom commented 4 years ago

I've been doing some test renders with the cbox.xml example scene with different spectral variables, to see which is the fastest; however, I expected GPU rendering to be much faster. Do these times look correct? Should I change something to improve rendering speed? I also didn't expect gpu_spectral the slowest.

All of the below are rendered at 256x256 at 256 samples_count scalar_spectraltook: 12.578 s packet_spectraltook: 12.993 s gpu_spectral at 64 samples_per_passtook: 13.119 s gpu_autodiff_spectralat 64 samples_per_passtook: 10.364 s

Is there maybe another or a better way I should be testing this?

abhinavvs commented 4 years ago

I ran very similar tests but on a different (custom-designed) scene and observed something very similar. I compared the scalar_rgb and the gpu_autodiff_rgb modes for rendering the scene with different values of sample_count. The results are summarized in the image below:

X-axis shows the sample_count value
Second column in the table refers to Rendering Time/sample_count which I expected to be much smaller for GPU rendering.
Also, due to the size of my scene (512x384), I wasn't able to increase samples_per_pass more than 50.

Is this expected? What's the best way to get the maximum computational acceleration?

wjakob commented 4 years ago

@abhinavvs, @garethwalkom -- we're aware of this. Enoki's JIT compiler is currently undergoing a complete redesign to improve performance on the GPU, amongst other things. We're optimistic that this will lead to significant speedups.

That said, if your goal is to have a very fast GPU path tracer, let me stress that you absolutely should not use Mitsuba 2. It uses a wavefront approach, where we read and write out large amounts of information into memory at each bounce. This memory traffic becomes the main bottleneck rather than, e.g., rays per second. The fastest current GPU path tracers instead use a megakernel approach, which avoids this heavy memory traffic. Obvious follow-up questoin: why did we not use a megakernel approach for Mitsuba? Mitsuba's GPU mode is primarily designed to enable differentiable rendering, which requires a wavefront simulation that allows us to record a complete graph of the underlying computation.

abhinavvs commented 4 years ago

Thanks for the insightful explanation, @wjakob! This explains a lot.

Personally, I would still stick with Mitsuba 2 since I am looking to perform differentiable rendering. The speed-up question was more of a 'good-to-know' type thing.

That said, I am eagerly looking forward to the Enoki JIT compiler redesign changes that you had mentioned. I am also looking forward to the plug-in in PR #44 which allows differentiation w.r.t object transformations. Both those upgrades would be very useful for my research. Could you comment on the rough expected release dates for these plug-ins/functionalities?

garethwalkom commented 4 years ago

Thanks @wjakob! It was also something I was curious to know.