mitsuba-renderer / mitsuba3

Mitsuba 3: A Retargetable Forward and Inverse Renderer
https://www.mitsuba-renderer.org/
Other
2.08k stars 243 forks source link

Black half-spheres when using `cuda_ad_rgb` renderer #805

Open qazwsxal opened 1 year ago

qazwsxal commented 1 year ago

Summary

High number of spheres (~800) results in black half-shading when using cuda_ad_rgb renderer

System configuration

System information:

OS: Ubuntu 22.04.2 LTS CPU: 12th Gen Intel(R) Core(TM) i7-12700H GPU: NVIDIA GeForce RTX 3050 Ti Laptop GPU Python: 3.9.17 (main, Jul 5 2023, 20:41:20) [GCC 11.2.0] NVidia driver: 525.125.06 LLVM: -1.-1.-1

Dr.Jit: 0.4.2 Mitsuba: 3.3.0 Is custom build? False Compiled with: GNU 10.2.1 Variants: scalar_rgb scalar_spectral cuda_ad_rgb llvm_ad_rgb

Description

the cuda_ad_rgb renderer produces incorrect results when shading large numbers of diffuse spheres with the "constant" light source

Steps to reproduce

  1. mitsuba -m scalar_rgb out.xml produces EXR file with correct shading
  2. mitsuba -m cuda_ad_rgb out.xml produces EXR file with incorrect shading

0000 zip containing xml in question: out.zip

rtabbara commented 1 year ago

Hi Adam,

I ran these commands both with the mitusba 3.3.0 pre-built binary as well as the latest from source. In both cases however, I wasn't able to reproduce the issue you were getting with the cuda variant. My Nvidia driver version is also 525.125.06

One data point that might be useful is whether you still get these artifacts when running the llvm variant.

Another thing I've noticed is that the screenshot you provided seems to have a higher spp count than when I simply run mitsuba -m variant out.xml. Just to make sure, are there any additional options that were supplied to generate the image or perhaps the scene differs slightly to what was provided?

qazwsxal commented 1 year ago

Hi Rami,

Thanks for getting back to me, apologies for attaching a screenshot from another run with more samples, the problem appears no matter the number of samples specified. I've ran mitsuba again with the three rgb variants, and the issue is only present on the cuda version. mitsuba_rendering_errors.zip

I've attached a zip with the generated exr images, including a version with optimisations disable -O0. Unfortunately it's still occuring with optimisations disabled.

I was also able to run the same scene on an Nvidia A100 with the same driver version, in which the rendering error did not occur:

System information:

OS: Ubuntu 22.04.2 LTS CPU: Intel(R) Xeon(R) CPU @ 2.20GHz GPU: NVIDIA A100-SXM4-40GB Python: 3.9.17 (main, Jul 5 2023, 20:41:20) [GCC 11.2.0] NVidia driver: 525.125.06 LLVM: -1.-1.-1

Dr.Jit: 0.4.2 Mitsuba: 3.3.0 Is custom build? False Compiled with: GNU 10.2.1 Variants: scalar_rgb scalar_spectral cuda_ad_rgb llvm_ad_rgb

rtabbara commented 1 year ago

Hi Adam,

Another recommendation is, if possible, changing the driver version used on the RTX 3050 to see if hopefully that makes a difference.

There is a possibility that there's an underlying misuse of the OptiX API somewhere that only manifests itself on some GPUs, however without the ablity to reproduce the issue it's a bit difficult to track down.

stymbhrdwj commented 10 months ago

I am facing a similar issue when running the tutorial volume_optimization.ipynb on the cuda_ad_rgb variant. I am, however, able to reproduce the correct results with the llvm_ad_rgb variant originally used in the notebook.

When using the NVIDIA GTX TITAN Xp GPU, I observe a half-black voxel artifact as shown below. These are from the "Intermediate results" section in the notebook. cuda_ad_rgb_intermediate_TITAN_Xp

When using the NVIDIA RTX 3050Ti (Mobile) GPU, it is as expected without any artifacts. cuda_ad_rgb_intermediate_RTX_3050Ti_Mobile

System Configuration

Note: Both systems use mitsuba-3.5.0 and drjit-0.4.4 provided in PyPI.

TITAN Xp System

OS: Ubuntu 20.04.6 LTS x86_64 Kernel: 5.15.0-83-generic CPU: Intel i7-8086K (12) @ 5.000GHz GPU: NVIDIA TITAN Xp GPU: NVIDIA GeForce GTX TITAN X GPU Driver: NVIDIA 535.104.05 CUDA Version: 11.7

RTX 3050Ti Mobile System

OS: Arch Linux x86_64 Host: Victus by HP Laptop 16-e0xxx Kernel: Linux 6.6.6-arch1-1 CPU: AMD Ryzen 7 5800H with Radeon Graphics (16) @ 4.4GHz GPU: NVIDIA GeForce RTX 3050 Ti Mobile GPU: AMD ATI Radeon Vega Series / Radeon Vega Mobile Series GPU Driver: NVIDIA 545.29.06 CUDA Version: 12.1

UPDATE: Apparently something is wrong with the TITAN Xp GPU. When I wrote a custom dr.wrap_ad function for volume optimization, I consistently got NaNs as the first few elements of the grad tensor. This problem disappears when using any RTX GPU.