mitsuba-renderer / mitsuba3

Mitsuba 3: A Retargetable Forward and Inverse Renderer
https://www.mitsuba-renderer.org/
Other
2.1k stars 246 forks source link

OptiX SBT is released on non-main thread #1091

Closed dvicini closed 1 month ago

dvicini commented 8 months ago

Hi,

I am seeing an odd, sporadic issue I haven't encountered before. I am running the cuda_ad_rgb variant on a relatively unconventional setup. I am rendering a simple scene in a colab notebook (similar to a Jupyter notebook) on a Nvidia v100 datacenter GPU.

Everything runs fine most of the time. However, every few runs I see the following issue: jit_optix_configure_sbt initially gets invoked on the main thread when loading the scene, but the cleanup callback that releases internal structures gets invoked on another thread. I am debugging this by printing std::this_thread::get_id() both on the configure call and within the cleanup call. I tried turning off parallel scene loading, but that didn't seem to make a difference.

Practically, the issue is then that the jit_free call in the cleanup callback internally refers to thread_state_cuda to free up host pinned memory. However, if the cleanup happens on the non-main thread, the thread_state_cuda might not have been initialized, leading to a null pointer dereference in jitc_free when getting thread_state_cuda->stream.

Two questions for this: 1) Is it expected that the cleanup might happen on a non-main thread? I am using the mi.render function, do we expect any of that custom op mechanism to potentially lead to another thread holding a reference to the Scene object?

2) Can we just replace the unprotected access to the cuda thread state by thread_state(JitBackend::CUDA) in jit_free?

It is possible that the non-main thread cleanup is related to something in ipython/colab holding an extra reference to the render op or the Mitsuba scene somehow.

This is the code I run in my colab cell, likely not of much help to understand the issue though:

import drjit as dr
import mitsuba as mi
import numpy as np

dr.set_thread_count(1)
mi.set_variant('cuda_ad_rgb')
dr.set_log_level(dr.LogLevel.InfoSym)

def render():
  scene = mi.load_dict(
      {
          'type': 'scene',
          'integrator': {'type': 'direct'},
          'shape': {
              'type': 'cube',
          },
          'emitter': {'type': 'constant'},
          'sensor': {
              'type': 'perspective',
              'to_world': mi.ScalarTransform4f.look_at(
                  [4, 4, 4.5], [0, 0, 0], [0, 1, 0]
              ),
          },
      },
      parallel=False,
  )
  image = np.array(mi.render(scene, spp=256))
  dr.sync_thread()
  del scene
  return image

image = render()
wjakob commented 8 months ago

Are you combining this with another array programming framework? For example, PyTorch code involving (our) custom operations causes them to be called from other threads during differentiation.

dvicini commented 8 months ago

No, nothing of that sort.

We would not expect Dr.Jit to ever use another thread here right? All the references to the renderop and the scene object should be on the main thread?

If that's the case, I am thinking that this might be something colab specific, e.g., it tries to use threads to provide some additional information to the user for debugging or inspection.

dvicini commented 1 month ago

Closing this for now since I am not sure if this is still an issue. Quite a lot has changed around clean up and threading logic.

(I will re-open if this comes up again)