mmp / pbrt-v4

Source code to pbrt, the ray tracer described in the forthcoming 4th edition of the "Physically Based Rendering: From Theory to Implementation" book.
https://pbrt.org
Apache License 2.0
2.86k stars 446 forks source link

Windows GPU Crashes #96

Closed shadeops closed 3 years ago

shadeops commented 3 years ago

Background

I have temporary access to a Windows box with a RTX 3080 so I wanted to give pbrt-v4 GPU rendering a try. Unfortunately I have not been able to get a successful render. I'm lacking in CUDA and Windows development experience so it's possible there is an issue on my end. (CPU rendering has been fine.)

I mainly looking for a baseline configuration known to work so I can focus my debugging efforts appropriately. (Is it a platform issue, is it a pbrt-v4 bug, is it an issue with the build, etc.)

Platform

Windows 10 GeForce RTX 3080 Nvidia Drivers 457.51 (DCH) Nvidia CUDA 11.2 Nvidia Optix 7.1 (I also tried 7.2) Visual Studio 2019 pbrt-v4 head as of Jan 3rd; https://github.com/mmp/pbrt-v4/tree/c91fa3b2b66e25b6781a78b46a4526b2fcf27edb

Building

The build process seems to go ok, and CMake detects CUDA.

Found CUDA: 11.2
checkcuda.cu

   Creating library C:\Users\shade\Tools\pbrt-v4-build\checkcuda.lib and object C:\Users\shade\Tools\pbrt-v4-build\checkcuda.exp

CUDA Architecture: sm_86
Configuring done

Test scenes

https://github.com/mmp/pbrt-v4-scenes/blob/9bc3810f66acc7875c2c36c817d3b091fc71141d/pbrt-book/book.pbrt https://github.com/mmp/pbrt-v4-scenes/blob/9bc3810f66acc7875c2c36c817d3b091fc71141d/killeroos/killeroo-simple.pbrt

With a Debug Build and CUDA_LAUNCH_BLOCKING set to 1 I get the following output:

Output

(This is from killeroo-simple.pbrt)

[ 26716.000 20210103.152643 C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/gpu/init.cpp:66 ] VERBOSE Selecting GPU device 0
[ 26716.000 20210103.152643 C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/gpu/init.cpp:82 ] VERBOSE Reset stack size to 8192
[ 26716.000 20210103.152643 C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\parser.cpp:121 ] VERBOSE Creating Tokenizer for C:\Users\shade\Tools\pbrt-v4-scenes\killeroos\killeroo-simple.pbrt
[ 26716.000 20210103.152643 C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\parser.cpp:121 ] VERBOSE Creating Tokenizer for C:\Users\shade\Tools\pbrt-v4-scenes\killeroos\geometry\killeroo.pbrt
[ 26716.000 20210103.152643 C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\parser.cpp:121 ] VERBOSE Creating Tokenizer for C:\Users\shade\Tools\pbrt-v4-scenes\killeroos\geometry\killeroo.pbrt
[ 26716.000 20210103.152643 C:\Users\shade\Documents\git\pbrt-v4\src\pbrt/film.h:189 ] VERBOSE Created film with full resolution [ 700, 700 ], pixelBounds [ [ 0, 0 ] - [ 700, 700 ] ]
[ 26716.000 20210103.152644 C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/cameras.cpp:230 ] VERBOSE Camera min pos differentials: [ -0, 0, 0 ], [ -0, 0, 0 ]
[ 26716.000 20210103.152644 C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/cameras.cpp:232 ] VERBOSE Camera min dir differentials: [ 0.00011622906, 0.00084993243, -3.5762787e-7 ], [ 0.00083166367, -0.0002102554, -3.5762787e-7 ]
[ 26716.000 20210103.152644 C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/gpu/accel.cpp:558 ] VERBOSE OptiX: KNOBS: All knobs on default.

[ 26716.000 20210103.152644 C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/gpu/accel.cpp:558 ] VERBOSE OptiX: DISK CACHE: Opened database: "C:\Users\shade\AppData\Local\NVIDIA\OptixCache\cache7.db"
[ 26716.000 20210103.152644 C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/gpu/accel.cpp:558 ] VERBOSE OptiX: DISK CACHE:     Cache data size: "44.5 MiB"
[ 26716.000 20210103.152644 C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/gpu/accel.cpp:601 ] VERBOSE Optix version 7.1.0 successfully initialized
[ 26716.000 20210103.152644 C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/gpu/accel.cpp:558 ] VERBOSE OptiX: DISKCACHE: Cache hit for key: ptx-4163858-key4b2c5f5ee5bdad769bb2b0195ca5fc56-sm_86-rtc1-drv457.51
[ 26716.000 20210103.152644 C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/gpu/accel.cpp:558 ] VERBOSE OptiX: DISKCACHE: Cache hit for key: ptx-38252-keyfda133fdff591bad2913888bb309bd85-sm_86-rtc1-drv457.51
[ 26716.000 20210103.152644 C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/gpu/accel.cpp:558 ] VERBOSE OptiX: DISKCACHE: Cache hit for key: ptx-2056-key1f75491c65bbcf2f269d0ec2ece0beac-sm_86-rtc1-drv457.51
[ 26716.000 20210103.152644 C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/gpu/accel.cpp:558 ] VERBOSE OptiX: COMPILE FEEDBACK: Info: Pipeline has 1 module(s), 23 entry function(s), 4 trace call(s), 0 continuation callable call(s), 0 direct callable call(s), 25405 basic block(s) in entry functions, 135262 instruction(s) in entry functions, 7 non-entry function(s), 42 basic block(s) in non-entry functions, 647 instruction(s) in non-entry functions

[ 26716.000 20210103.152644 C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\parsedscene.cpp:713 ] VERBOSE Loading 0,0 textures in parallel, 0,0 serially
[ 26716.000 20210103.152644 C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\parsedscene.cpp:754 ] VERBOSE Loading serial textures
[ 26716.000 20210103.152644 C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\parsedscene.cpp:789 ] VERBOSE Done creating textures
[ 26716.000 20210103.152645 C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/gpu/pathintegrator.cpp:203 ] VERBOSE Will render in 1 passes 700 scanlines per pass

Rendering: [                                                                                                                                                                                                                                                                                                            ] [ 26716.000 20210103.152645 C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/gpu/pathintegrator.cpp:330 ] VERBOSE Starting to submit work for sample 0
[ 26716.000 20210103.152645 C:\Users\shade\Documents\git\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Reset ray queue]: block size 768
[ 26716.000 20210103.152645 C:\Users\shade\Documents\git\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Generate Camera rays]: block size 512
[ 26716.000 20210103.152645 C:\Users\shade\Documents\git\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Update camera ray stats]: block size 768
[ 26716.000 20210103.152645 C:\Users\shade\Documents\git\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Reset queues before tracing rays]: block size 768
[ 26716.000 20210103.152645 C:\Users\shade\Documents\git\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Generate ray samples - HaltonSampler]: block size 1024
[ 26716.000 20210103.152645 C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/gpu/accel.cpp:1144 ] VERBOSE Launching intersect closest
[ 26716.000 20210103.152645 C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/gpu/accel.cpp:1160 ] VERBOSE Post-sync triangle intersect closest
[ 26716.000 20210103.152645 C:\Users\shade\Documents\git\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Handle emitters hit by indirect rays]: block size 512
[ 26716.000 20210103.152645 C:\Users\shade\Documents\git\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [CoatedDiffuseMaterial + BxDF Eval (Basic tex)]: block size 512

<snipping many IsNaN asserts>

C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/bxdfs.cpp:190: block: [19,0,0], thread: [267,0,0] Assertion `!IsNaN(pdf)` failed.
C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/bxdfs.cpp:190: block: [19,0,0], thread: [230,0,0] Assertion `!IsNaN(pdf)` failed.
C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/bxdfs.cpp:190: block: [19,0,0], thread: [236,0,0] Assertion `!IsNaN(pdf)` failed.
C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/bxdfs.cpp:190: block: [6,0,0], thread: [1,0,0] Assertion `!IsNaN(pdf)` failed.
C:/Users/shade/Documents/git/pbrt-v4/src/pbrt/bxdfs.cpp:190: block: [6,0,0], thread: [3,0,0] Assertion `!IsNaN(pdf)` failed.
[ 27752.000 20210103.150527 C:\Users\shade\Documents\git\pbrt-v4\src\pbrt/gpu/launch.h:43 ] FATAL CUDA error: device-side assert triggered
Rendering: [                                                                                   ]  (0.1s|?s)  (C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\util\check.cpp)     0x00007FF69D86A1A0 - pbrt::PrintStackTrace + line 120
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\util\check.cpp)  0x00007FF69D86A560 - pbrt::CheckCallbackScope::Fail + line 148
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\util\log.cpp)    0x00007FF69D419EA0 - pbrt::LogFatal + line 177
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\util\log.h)      0x00007FF69D3F5990 - pbrt::LogFatal<char const *> + lineRendering: [                                                                                   ] 1 (0.1s|?s)  2
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\gpu\launch.h)    0x00007FF69D962220 - ??@fb2113fe3720ffc57695a9b54895ea73@ + line 44
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\gpu\launch.h)    0x00007FF69D95EC90 - ??@9d705b4ae36b23c69df338f9524b43c2@ + line 80
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\gpu\workqueue.h) 0x00007FF69D9551A0 - pbrt::ForAllQueued<__nv_dl_wrapper_t<__nv_dl_tag<void (__cdecl pbrt::GPUPathIntegrator::*)(pbrt::BasicTextureEvaluator,pbrt::MultiWorkQueue<pbrt::TypePack<pbrt::MaterialEvalWorkItem<pbrt::CoatedDiffuseMaterial>,pbrt::MaterialEvalWorkItem<pbrt::CoatedConductorMaterial>,pbrt::MaterialEvalWorkItem<pbrt::ConductorMaterial>,pbrt::MaterialEvalWorkItem<pbrt::DielectricMaterial>,pbrt::MaterialEvalWorkItem<pbrt::DiffuseMaterial>,pbrt::MaterialEvalWorkItem<pbrt::DiffuseTransmissionMaterial>,pbrt::MaterialEvalWorkItem<pbrt::HairMaterial>,pbrt::MaterialEvalWorkItem<pbrt::MeasuredMaterial>,pbrt::MaterialEvalWorkItem<pbrt::SubsurfaceMaterial>,pbrt::MaterialEvalWorkItem<pbrt::ThinDielectricMaterial>,pbrt::MaterialEvalWorkItem<pbrt::MixMaterial> > > *,int),&pbrt::GPUPathIntegrator::EvaluateMaterialAndBSDF<pbrt::DiffuseMaterial,pbrt::BasicTextureEvaluator>,1>,pbrt::GPUPathIntegrator,pbrt::BasicTextureEvaluator,int,pbrt::RayQueue *>,pbrt::MaterialEvalWorkItem<pbrt::DiffuseMaterial> > + line 99
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\gpu\surfscatter.cpp)     0x00007FF69D952B70 - pbrt::GPUPathIntegrator::EvaluateMaterialAndBSDF<pbrt::DiffuseMaterial,pbrt::BasicTextureEvaluator> + line 79
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\gpu\surfscatter.cpp)     0x00007FF69D952A80 - pbrt::GPUPathIntegrator::EvaluateMaterialAndBSDF<pbrt::DiffuseMaterial> + line 65
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\gpu\surfscatter.cpp)     0x00007FF69D9510F0 - pbrt::EvaluateMaterialCallback::operator()<pbrt::DiffuseMaterial> + line 52
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\util\containers.h)       0x00007FF69D9564D0 - pbrt::ForEachType<pbrt::EvaluateMaterialCallback,pbrt::DiffuseMaterial,pbrt::DiffuseTransmissionMaterial,pbrt::HairMaterial,pbrt::MeasuredMaterial,pbrt::SubsurfaceMaterial,pbrt::ThinDielectricMaterial,pbrt::MixMaterial> + line 157
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\util\containers.h)       0x00007FF69D956460 - pbrt::ForEachType<pbrt::EvaluateMaterialCallback,pbrt::DielectricMaterial,pbrt::DiffuseMaterial,pbrt::DiffuseTransmissionMaterial,pbrt::HairMaterial,pbrt::MeasuredMaterial,pbrt::SubsurfaceMaterial,pbrt::ThinDielectricMaterial,pbrt::MixMaterial> + line 158
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\util\containers.h)       0x00007FF69D9563F0 - pbrt::ForEachType<pbrt::EvaluateMaterialCallback,pbrt::ConductorMaterial,pbrt::DielectricMaterial,pbrt::DiffuseMaterial,pbrt::DiffuseTransmissionMaterial,pbrt::HairMaterial,pbrt::MeasuredMaterial,pbrt::SubsurfaceMaterial,pbrt::ThinDielectricMaterial,pbrt::MixMaterial> + line 158
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\util\containers.h)       0x00007FF69D956310 - pbrt::ForEachType<pbrt::EvaluateMaterialCallback,pbrt::CoatedConductorMaterial,pbrt::ConductorMaterial,pbrt::DielectricMaterial,pbrt::DiffuseMaterial,pbrt::DiffuseTransmissionMaterial,pbrt::HairMaterial,pbrt::MeasuredMaterial,pbrt::SubsurfaceMaterial,pbrt::ThinDielectricMaterial,pbrt::MixMaterial> + line 158
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\util\containers.h)       0x00007FF69D956380 - pbrt::ForEachType<pbrt::EvaluateMaterialCallback,pbrt::CoatedDiffuseMaterial,pbrt::CoatedConductorMaterial,pbrt::ConductorMaterial,pbrt::DielectricMaterial,pbrt::DiffuseMaterial,pbrt::DiffuseTransmissionMaterial,pbrt::HairMaterial,pbrt::MeasuredMaterial,pbrt::SubsurfaceMaterial,pbrt::ThinDielectricMaterial,pbrt::MixMaterial> + line 158
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\util\taggedptr.h)        0x00007FF69D9562A0 - pbrt::TaggedPointer<pbrt::CoatedDiffuseMaterial,pbrt::CoatedConductorMaterial,pbrt::ConductorMaterial,pbrt::DielectricMaterial,pbrt::DiffuseMaterial,pbrt::DiffuseTransmissionMaterial,pbrt::HairMaterial,pbrt::MeasuredMaterial,pbrt::SubsurfaceMaterial,pbrt::ThinDielectricMaterial,pbrt::MixMaterial>::ForEachType<pbrt::EvaluateMaterialCallback> + line 254
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\gpu\surfscatter.cpp)     0x00007FF69D945D90 - pbrt::GPUPathIntegrator::EvaluateMaterialsAndBSDFs + line 58
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\gpu\pathintegrator.cpp)  0x00007FF69D4CF550 - pbrt::GPUPathIntegrator::Render + line 398
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\gpu\pathintegrator.cpp)  0x00007FF69D4CEF60 - pbrt::GPURender + line 592
(C:\Users\shade\Documents\git\pbrt-v4\src\pbrt\cmd\pbrt.cpp)    0x00007FF69D3D7900 - main + line 241
(d:\agent\_work\63\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)      0x00007FF69DD66AA0 - invoke_main + line 79
Rendering: [                                                                                   ] ools\ (0.2s|?s)  crt\vcstartup\src\startup\exe_common.inl)     0x00007FF69DD66850 - __scrt_common_main_seh + line 288
(d:\agent\_work\63\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)      0x00007FF69DD66830 - __scrt_common_main + line 331
(d:\agent\_work\63\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp)        0x00007FF69DD66B60 - mainCRTStartup + line 17
(unknown                                 )      0x00007FFD67327020 - BaseThreadInitThunk
(unknown                                 )      0x00007FFD6889D0B0 - RtlUserThreadStart
mmp commented 3 years ago

Windows + GPU is a total mess at this point (as evidenced by many open bugs on this front.) My sincere apologies about this, but it's been a combination of being busy with other things (writing the book text) and not having easy access to a Windows + GPU system that has prevented me from being very effective at chasing it down.

That said, having help figuring it out would be very much appreciated!

Interestingly, that IsNaN() assertion hitting is new to me (and I don't think has been reported before).

PR #71 (which I will merge tomorrow) improves GPU debug builds, which may be helpful.

I am pretty sure that Windows worked well in general at commit 9be5258cb4a56464f7e2450bb6d7ad5fac5c8b69. Reverting back to there and trying might be an interesting experiment. That at least would give a sense if things broke due to driver/compiler updates or due to changes in pbrt. (And then if that one does work, bisecting to figure out where things broke would be interesting.)

shadeops commented 3 years ago

Thanks @mmp that's exactly what was I looking for!

I just tried 9be5258 and while it didn't crash it rendered both pbrt-book and killeroos completely black. So I went back a few commits to 41a70da and using the very simple test scene from https://github.com/mmp/pbrt-v4/issues/59#issue-717819792 I got my first successful (and very fast) GPU render.

Now that I got something working, I can start to investigate and debug with confidence.

mmp commented 3 years ago

That's already a useful data point. It wasn't clear to me if it was something that had changed in the system or something in more recent versions of the driver/CUDA/OptiX. It's good to know that it should be fixable in pbrt, though given that top of tree pbrt+GPU does work fine under Linux, I imagine it's going to be something subtle...

(And as you probably now understand, that speed is addictive!)

On Sun, Jan 3, 2021 at 11:37 PM Jim Price notifications@github.com wrote:

Thanks @mmp https://github.com/mmp that's exactly what was I looking for!

I just tried 9be5258 https://github.com/mmp/pbrt-v4/commit/9be5258cb4a56464f7e2450bb6d7ad5fac5c8b69 and while it didn't crash it rendered both pbrt-book and killeroos completely black. So I went back a few commits to 41a70da https://github.com/mmp/pbrt-v4/commit/41a70da92b7104ec986ba4fe66e5819470552969 and using the very simple test scene from #59 (comment) https://github.com/mmp/pbrt-v4/issues/59#issue-717819792 I got my first successful (and very fast) GPU render.

Now that I got something working, I can start to investigate and debug with confidence.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mmp/pbrt-v4/issues/96#issuecomment-753810853, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAZBJ4GMICXQ7R4TTZLCD3SYFV2DANCNFSM4VSH7SEA .

mmp commented 3 years ago

With the latest fix, those scenes (and many others) now render successfully on Windows for me, so marking this closed (yaay!).

FaithZL commented 3 years ago

Nice!