mmp / pbrt-v4

Source code to pbrt, the ray tracer described in the forthcoming 4th edition of the "Physically Based Rendering: From Theory to Implementation" book.
https://pbrt.org
Apache License 2.0
2.81k stars 433 forks source link

Compile errors on MSVC with CUDA 12.1/Optix 7.7 #331

Closed aclysma closed 1 year ago

aclysma commented 1 year ago

I had to rename the call optixModuleCreateFromPTX(...) to optixModuleCreate(...) and remove setting OptixPipelineLinkOptions::debugLevel. I confirmed in the Optix 7.7 release notes that the function was renamed and the member was removed.

Unfortunately I am having the same symptoms as mentioned by @Reev970 in #329. In particular, bmw-m6 just draws a background when using --gpu.

image

I get an ok render with CPU mode (albeit with --quick)

mmp commented 1 year ago

Optix 7.7 builds are fixed now (and are now included in the CI tests as well)--thanks for reporting that.

For those GPU rendering failures, could you try: 1. running with --nthreads 1, and 2. using a debug build of pbrt? (You may want to add '--spp 32` or something since the debug build is a lot slower.

If it works with one (CPU) thread, then that would be good to know as it suggests something going wrong in the multi-threaded scene construction code. The debug build does more checking and logging and may fail in a more obvious way.

aclysma commented 1 year ago

Unfortunately I'm still getting the same result with debug builds and --nthreads 1. (Specifically I used --gpu --log-level "verbose" --log-file log.txt --nthreads 1 --quick ../../pbrt-v4-scenes/bmw-m6/bmw-m6.pbrt)

For what it's worth, I've tried stepping through this in nsight compute. It reports a return of cudaErrorUnknown(999) from the second kernel it launches.

First kernel: (1, 1, 1), (768, 1, 1) returns cudaSuccess(0) Second kernel: (171, 1, 1), (512, 1, 1) returns cudaErrorUnknown(999)

This is with a 4080 on windows 10, with a relatively fresh install of windows. There's not much installed on it. I was getting this behavior both before and after running DDU and re-installing cuda/nvidia drivers. When I reinstalled I tried cuda 11.8/optix 7.5 just to see if that would help. I used the nvidia driver that was included in that installer rather than latest nvidia drivers. (I was on latest nvidia drivers before when I tried 12.1/7.7.)

Unfortunately I don't see a lot of information in nsight about the failed kernel.

aclysma commented 1 year ago

(Also notably, I can step through the disney-cloud scene with nsight compute just fine.)

aclysma commented 1 year ago

Just trying to narrow things down, the kernel that failed in nsight was the one in WavefrontPathIntegrator::GenerateCameraRays

If I comment out rayQueue->PushCameraRay(cameraRay->ray, lambda, pixelIndex); then nsight shows a success return code for that kernel. I went into RayQueue::PushCameraRay and commented out all the this->...[index] = ... lines and it still gets a success return code. But if I uncomment this->depth[index] = 0; for example, I see the failure code for the kernel.

I've tried using cuda debugging with this and stepping through it. It seems to succeed but generates the above image.

It's entirely possible that nsight failing when running this kernel is a red herring, but I don't have much else to go off of.

pbrt4bounty commented 1 year ago

It looks like the problem is from Cuda 12.1, no matter what version of Optix you're using. In my test here, using Cuda 12.1 with Optix 7.4, 7.6 or 7.7 gives the same empty render result. Edited: 'hair' is rendered OK

aclysma commented 1 year ago

Last night I uninstalled cuda 12.1 and installed cuda 11.8 and re-ran cmake last night. nvcc --version reported 11.8. However, it still found cuda 12.1 stuff when generating the MSVC project files.

-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.1/bin/nvcc.exe
-- Found CUDA: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8 (found version "11.8")
-- Found CUDA: 11.8.89
-- The CUDA compiler identification is NVIDIA 12.1.66
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.1/bin/nvcc.exe - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done

I went to add/remove programs and found the remaining cuda 12.1 stuff and manually uninstalled those. I then re-ran the cuda 11.8 installer.

-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8/bin/nvcc.exe
-- Found CUDA: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8 (found version "11.8")
-- Found CUDA: 11.8.89
-- The CUDA compiler identification is NVIDIA 11.8.89
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8/bin/nvcc.exe - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done

Unfortunately this doesn't seem to have helped.

pbrt4bounty commented 1 year ago

Seems that the last 'good' cuda version is 11.7

aclysma commented 1 year ago

I tried downgrading to 11.7. It passes --gpu-architecture=sm_89 to nvcc.

14>------ Build started: Project: optix.cu, Configuration: Debug x64 ------
14>nvcc fatal   : Value 'sm_89' is not defined for option 'gpu-architecture'

If 11.7 or earlier is required at the moment, then I think this excludes 40-series cards unless there is a workaround for this.

aclysma commented 1 year ago

Figured out how to override the gpu-architecture, change this CMakeLists.txt line to specify sm_80 here... set (PBRT_GPU_SHADER_MODEL "sm_80" CACHE STRING "")

This compiles and runs but does not fix the problem. So I'm not convinced using 11.7 cuda toolkit is the fix.

pbrt4bounty commented 1 year ago

A couple of notes:

mmp commented 1 year ago

This is all super useful debugging. If it's one of the first kernel launches then it's very likely one of the OptiX BVH building kernels. That also would fit with the geometry all disappearing but rendering otherwise proceeding ok. (The fact that e.g. the env map is shown suggests that the camera rays are ok, etc.)

I have a few followup questions and will continue to try to get an environment to reproduce that locally here. Let's centralize discussion in issue #329; I'll close this one out since the compile errors originally reported are now fixed..