mitsuba-renderer / drjit

Dr.Jit — A Just-In-Time-Compiler for Differentiable Rendering
BSD 3-Clause "New" or "Revised" License
595 stars 44 forks source link

Critical Dr.Jit compiler failure: cuda_check(): API error 0718 (CUDA_ERROR_INVALID_PC): "invalid program counter" in D:\a\drjit\drjit\ext\drjit-core\src\eval.cpp:395 #296

Open aantg opened 1 month ago

aantg commented 1 month ago

Hello. Got this error after updating NVIDIA driver to latest version 565.90 even with "Hello World" example from Mitsuba3 documentation (using 'cuda_ad_rgb' variant of course). Rolling back to previous driver version (561.09) make this error disappear.

Looks like there's some incompatibility?

DrJit 0.4.6 + mitsuba 3.5.2

njroussel commented 1 month ago

Hi @aantg

What OS and GPU model are you using?

In my personal experience these type of errors are often related to a faulty driver installation.

tatue64 commented 1 week ago

Can confirm this on Manjaro Linux, NVIDIA driver 565.57.01, cuda 12.6.2 (but downgrading cuda did not help).

This error occurs with drjit 0.4.6 and Mitsuba 3.5.2, but also with drjit 1.0.0 in the current development version Mitsuba 3.6 when compiled in Release mode (clang 18.1.8 ). "llmv_ad_rgb" always works. Somewhat surprisingly, the error does not occur if the program is compiled in Debug mode.

In the latter case (Mitsuba 3.6) the scene from the tutorial "editing_a_scene" notebook works with "cuda_ad_rgb", but fails, for example, when the "diffuse" material is replaced with "roughplastic".

This may be a driver problem (as far as I know, this is a beta driver), but the dependence of the error on the compilation mode and the dependence on scene parameters suggest a problem with drjit.

===================================

2024-11-09 18:31:12 INFO  main  [mitsuba.cpp:334] Mitsuba version 3.6.0 (master[a8a03722], Linux, 64bit, 64 threads, 8-wide SIMD)
2024-11-09 18:31:12 INFO  main  [mitsuba.cpp:335] Copyright 2022, Realistic Graphics Lab, EPFL
2024-11-09 18:31:12 INFO  main  [mitsuba.cpp:336] Enabled processor features: cuda llvm avx2 avx fma f16c sse4.2 x86_64
2024-11-09 18:31:12 INFO  main  [xml.cpp:1380] Loading XML file "scenes/test.xml" with variant "cuda_ad_rgb"..
2024-11-09 18:31:13 INFO  main  [Scene] Building scene in OptiX ..
2024-11-09 18:31:13 INFO  main  [Scene] OptiX ready. (took 54ms)
2024-11-09 18:31:13 INFO  main  [xml.cpp:1398] Done loading XML file "scenes/test.xml" (took 1.346s).
2024-11-09 18:31:13 INFO  main  [SamplingIntegrator] Starting render job (1028x516, 528 samples)
2024-11-09 18:31:13 INFO  main  [SamplingIntegrator] Computation graph recorded. (took 4ms)
2024-11-09 18:31:13 INFO  main  [SamplingIntegrator] Code generation finished. (took 16ms)

Dr.Jit encountered an unrecoverable error and will now shut
down. Please re-run your program in debug mode to check for
out-of-bounds reads, writes, and other sources of undefined
behavior. You can do so by calling

   dr.set_flag(dr.JitFlag.Debug, True)

at the beginning of the program. If these additional checks
fail to pinpoint the problem, then you have likely found a
bug. We are happy to help investigate and fix the problem if
you can you create a self-contained reproducer and submit it
at https://github.com/mitsuba-renderer/drjit.

The error message of this specific failure is as follows:
>>> cuda_check(): API error 0718 (CUDA_ERROR_INVALID_PC): "invalid program counter" in /home/xxx/progs/mitsuba_x/mitsuba36/ext/drjit/ext/drjit-core/src/init.cpp:462.