mmp / pbrt-v4

Source code to pbrt, the ray tracer described in the forthcoming 4th edition of the "Physically Based Rendering: From Theory to Implementation" book.
https://pbrt.org
Apache License 2.0
2.89k stars 454 forks source link

Performance Issue on Windows with multiple GPUs #443

Open ndming opened 1 month ago

ndming commented 1 month ago

I built pbrt with the following setup: MSVC 143 (VS BuildTools 17.9) CUDA 12.1 OptiX 7.7.0

I have 2 RTX 2080 SUPERs. Rendering a simple scene with CPU took 9.8s, while rendering the same scene with GPU using --gpu-device 0 or --gpu-device 1 took 35.5s, significantly slower.

I noticed that on idle, nvidia-smi reports my 2 GPUs have around 2% work, but once I start rendering with either the GPU, the chosen GPU gets nearly 90% work reported from nvidia-smi.

Please let me know if I should change build settings to run pbrt faster on GPU. Should I upgrade to recent versions of CUDA or OptiX?

Here is some stats from the GPU rendering:

Wavefront Kernel Profile:
  Generate camera rays                                128 launches    974.10 ms /   2.7% (avg  7.610, min  7.135, max   9.175)
  Generate ray samples - HaltonSampler                768 launches    599.40 ms /   1.6% (avg  0.780, min  0.111, max   2.818)
  Trace closest hit rays                              768 launches  30105.62 ms /  82.7% (avg 39.200, min  0.535, max 184.803)
  Handle escaped rays                                 768 launches    663.86 ms /   1.8% (avg  0.864, min  0.075, max   4.469)
  Handle emitters hit by indirect rays                768 launches     63.63 ms /   0.2% (avg  0.083, min  0.058, max   0.145)
  DielectricMaterial + BxDF eval (Basic tex)          640 launches    943.32 ms /   2.6% (avg  1.474, min  0.104, max   4.884)
  DiffuseMaterial + BxDF eval (Basic tex)             640 launches     55.94 ms /   0.2% (avg  0.087, min  0.058, max   0.375)
  DiffuseMaterial + BxDF eval (Universal tex)         640 launches   1692.83 ms /   4.6% (avg  2.645, min  0.163, max  12.804)
  Trace shadow rays                                   640 launches    420.43 ms /   1.2% (avg  0.657, min  0.107, max   2.505)
  Update film                                         128 launches    856.00 ms /   2.4% (avg  6.688, min  5.003, max  10.738)
  Other                                              2304 launches     44.81 ms /   0.1% (avg  0.019)

Total rendering time:  36419.94 ms

Wavefront integrator statistics:
    Camera rays                                                  20480000
    Indirect rays, depth 1                                       16417569
    Indirect rays, depth 2                                        4231887
    Indirect rays, depth 3                                        1880224
    Indirect rays, depth 4                                         194875
    Indirect rays, depth 5                                          87770
    Shadow rays, depth 0                                          6416378
    Shadow rays, depth 1                                            87296
    Shadow rays, depth 2                                          1153977
    Shadow rays, depth 3                                            33566
    Shadow rays, depth 4                                            32123

Statistics:
  Geometry
    Spheres                                                             1
    Buffer cache hits                                    0 /            3 (0.00%)
    Bilinear patches per mesh                            1 /            1 (1.00x)
  Memory
    Acceleration structures                                          3.12 kB
    Bilinear patches                                                 0.06 kB
    Film pixels                                                      8.54 MiB
    Light BVH                                                        0.06 kB
    Wavefront integrator pixel state                               278.49 MiB
    Unreported / unused                                            306.13 MiB
  Scene
    Lights                                                              2
    Materials                                                           3
    Textures                                                            3
ndming commented 1 month ago

Rebuilding pbrt with CUDA 12.4 and OptiX 8.0.0 solved the problem for me. It's worth saying that with CUDA 12.1 and OtpiX 7.7 running on a GTX 1050 doesn't face this slow down issue.

ndming commented 1 month ago

The above fixed the issue, however, the rendered scene is completely black, any idea how can I fix this?

NicNel commented 1 month ago

@ndming, try to use buildtools v14.39, original post #428, how to fix #429

ndming commented 1 month ago

It seems like on Windows systems with more than 1 GPU device, there's no way to fix performance slowdown. I was able to sucessfully build pbrt with both CUDA versions when downgrading MSVC, but the rendering time with GPU stays longer, since my system has 2 RTX 2080s.