nerfstudio-project / nerfstudio

A collaboration friendly studio for NeRFs
https://docs.nerf.studio
Apache License 2.0
9.46k stars 1.29k forks source link

Using OpenGL Context while using viewer #2310

Open sweeneychris opened 1 year ago

sweeneychris commented 1 year ago

Describe the bug I am currently working on a new method that is using the nvdiffrast library to do texture sampling for feature grids to implement this new encoding https://wbhu.github.io/projects/Tri-MipRF/ https://nvlabs.github.io/nvdiffrast/

The nvdiffrast acquires its own OpenGL context, and when I run my code with the web viewer also running. I get the error below. If I run the method without using the viewer it works fine. This leads me to believe that there is contention between the viewer's opengl context and nvdiffrast. This is just my best guess, but I am looking for feedback from nerfstudio authors and users about how to debug and fix this! Do you have any thoughts about my theory here, or any suggestions to get around this issue?

Here is the trace:

Traceback (most recent call last): File "/usr/lib64/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/home/sweeneychris/projects/nerfstudio/nerfstudio/viewer/server/render_state_machine.py", line 173, in run outputs = self._render_img(action.cam_msg) File "/home/sweeneychris/projects/nerfstudio/nerfstudio/viewer/server/render_state_machine.py", line 148, in _render_img outputs = self.viewer.get_model().get_outputs_for_camera_ray_bundle(camera_ray_bundle) File "/home/sweeneychris/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/sweeneychris/projects/nerfstudio/nerfstudio/models/base_model.py", line 179, in get_outputs_for_camera_ray_bundle outputs = self.forward(ray_bundle=ray_bundle) File "/home/sweeneychris/projects/nerfstudio/nerfstudio/models/base_model.py", line 142, in forward return self.get_outputs(ray_bundle) File "/home/sweeneychris/projects/trimiprf_nerfstudio/trimiprf/trimiprf.py", line 297, in get_outputs ray_samples, weights_list, ray_samples_list = self.proposal_sampler( File "/home/sweeneychris/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/sweeneychris/projects/nerfstudio/nerfstudio/model_components/ray_samplers.py", line 50, in forward return self.generate_ray_samples(args, kwargs) File "/home/sweeneychris/projects/nerfstudio/nerfstudio/model_components/ray_samplers.py", line 602, in generate_ray_samples density = density_fns[i_level](ray_samples.frustums.get_positions()) File "/home/sweeneychris/projects/trimiprf_nerfstudio/trimiprf/trimip_field.py", line 369, in densityfn density, = self.get_density(ray_samples) File "/home/sweeneychris/projects/trimiprf_nerfstudio/trimiprf/trimip_field.py", line 391, in get_density features = interpolate_ms_features( File "/home/sweeneychris/projects/trimiprf_nerfstudio/trimiprf/trimip_field.py", line 63, in interpolate_ms_features grid_features = grid(pts) File "/home/sweeneychris/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/sweeneychris/projects/trimiprf_nerfstudio/trimiprf/trimip_encoding.py", line 74, in forward plane_features = texture(self.plane_coef, plane_coord, File "/home/sweeneychris/.local/lib/python3.10/site-packages/nvdiffrast/torch/ops.py", line 615, in texture return _texture_func.apply(filter_mode, tex, uv, filter_mode_enum, boundary_mode_enum) File "/home/sweeneychris/.local/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, **kwargs) # type: ignore[misc] File "/home/sweeneychris/.local/lib/python3.10/site-packages/nvdiffrast/torch/ops.py", line 504, in forward out = _get_plugin().texture_fwd(tex, uv, filter_mode_enum, boundary_mode_enum) RuntimeError: Cuda error: 9[cudaLaunchKernel(func_tbl[func_idx], gridSize, blockSize, args, 0, stream);]

To Reproduce Steps to reproduce the behavior:

  1. Run code featuring nvdiffrast
  2. Start the viewer while the model is training
  3. Instant crash with the trace above

Expected behavior No crashing!

kerrj commented 1 year ago

I'm afraid it's hard to help given how little I know about what you're doing, but for some context the viewer does use WebGL to render everything, but that doesn't seem like it should interfere with your python code (it's only running in javascript, not in the same process). Maybe there's something that happens when you start streaming data to the server in python that breaks the OpenGL code?

jkulhanek commented 1 year ago

Wait so you are using the created context inside model’s get outputs? Perhaps your code cannot handle multiple threads - and I believe the viewer calls get_outputs from a different thread (with a lock).