Closed dvicini closed 1 month ago
Are you combining this with another array programming framework? For example, PyTorch code involving (our) custom operations causes them to be called from other threads during differentiation.
No, nothing of that sort.
We would not expect Dr.Jit to ever use another thread here right? All the references to the renderop and the scene object should be on the main thread?
If that's the case, I am thinking that this might be something colab specific, e.g., it tries to use threads to provide some additional information to the user for debugging or inspection.
Closing this for now since I am not sure if this is still an issue. Quite a lot has changed around clean up and threading logic.
(I will re-open if this comes up again)
Hi,
I am seeing an odd, sporadic issue I haven't encountered before. I am running the
cuda_ad_rgb
variant on a relatively unconventional setup. I am rendering a simple scene in a colab notebook (similar to a Jupyter notebook) on a Nvidia v100 datacenter GPU.Everything runs fine most of the time. However, every few runs I see the following issue:
jit_optix_configure_sbt
initially gets invoked on the main thread when loading the scene, but the cleanup callback that releases internal structures gets invoked on another thread. I am debugging this by printingstd::this_thread::get_id()
both on the configure call and within the cleanup call. I tried turning off parallel scene loading, but that didn't seem to make a difference.Practically, the issue is then that the
jit_free
call in the cleanup callback internally refers tothread_state_cuda
to free up host pinned memory. However, if the cleanup happens on the non-main thread, thethread_state_cuda
might not have been initialized, leading to a null pointer dereference injitc_free
when gettingthread_state_cuda->stream
.Two questions for this: 1) Is it expected that the cleanup might happen on a non-main thread? I am using the
mi.render
function, do we expect any of that custom op mechanism to potentially lead to another thread holding a reference to theScene
object?2) Can we just replace the unprotected access to the cuda thread state by
thread_state(JitBackend::CUDA)
injit_free
?It is possible that the non-main thread cleanup is related to something in ipython/colab holding an extra reference to the render op or the Mitsuba scene somehow.
This is the code I run in my colab cell, likely not of much help to understand the issue though: