Something wrong when setting variant as "cuda_ad_rgb"

Goo-JZhang commented 1 year ago

Summary

I would like to use gpu to do the rendering, but when I tried mi.set_variant('cuda_ad.rgb'), I get the runtime error.

I install mitsuba and drjit via pip install.

System configuration

System information:

OS: Linux version 5.15.90.1-microsoft-standard-WSL2 (oe-user@oe-host) (x86_64-msft-linux-gcc (GCC) 9.3.0, GNU ld (GNU Binutils) 2.34.0.20200220) CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz with x86_64 architecture GPU: GTX 1650 Python version: python3.9 LLVM version: Not install CUDA version: 12.0 NVidia driver: 528.02

There is my output of nvidia-smi:

Dr.Jit version: 0.4.1 Mitsuba version: 3.2.1 Compiled with: Don't know Variants compiled: Don't know There is my mitsuba -v command output: Mitsuba version 3.2.1 (master[2c03478], Linux, 64bit, 12 threads, 8-wide SIMD) Copyright 2022, Realistic Graphics Lab, EPFL Enabled processor features: cuda llvm avx f16c sse4.2 x86_64

Description

Python 3.9.16 (main, Mar 8 2023, 14:00:05) [GCC 11.2.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information.

import drjit as dr import mitsuba as mi mi.variants() ['scalar_rgb', 'scalar_spectral', 'cuda_ad_rgb', 'llvm_ad_rgb'] mi.set_variant('cuda_ad_rgb') Traceback (most recent call last): File "/home/zjlwsl/anaconda3/lib/python3.9/site-packages/mitsuba/init.py", line 107, in getattribute import('mitsuba.mitsuba' + variant + '_ext'), File "/home/zjlwsl/anaconda3/lib/python3.9/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1030, in _gcd_import File "", line 1007, in _find_and_load File "", line 986, in _find_and_load_unlocked File "", line 666, in _load_unlocked File "", line 565, in module_from_spec File "", line 1173, in create_module File "", line 228, in _call_with_frames_removed ImportError: jit_init_thread_state(): the CUDA backend hasn't been initialized. Make sure to call jit_init(JitBackend::CUDA) to properly initialize this backend. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 1, in File "/home/zjlwsl/anaconda3/lib/python3.9/site-packages/mitsuba/init.py", line 317, in set_variant _import('mitsuba.ad.integrators') File "/home/zjlwsl/anaconda3/lib/python3.9/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1030, in _gcd_import File "", line 1007, in _find_and_load File "", line 972, in _find_and_load_unlocked File "", line 228, in _call_with_frames_removed File "", line 1030, in _gcd_import File "", line 1007, in _find_and_load File "", line 986, in _find_and_load_unlocked File "", line 680, in _load_unlocked File "", line 850, in exec_module File "", line 228, in _call_with_frames_removed File "/home/zjlwsl/anaconda3/lib/python3.9/site-packages/mitsuba/python/ad/init.py", line 1, in from .integrators import * File "/home/zjlwsl/anaconda3/lib/python3.9/site-packages/mitsuba/python/ad/integrators/init.py", line 25, in importlib.import_module('mitsuba.ad.integrators.' + name) File "/home/zjlwsl/anaconda3/lib/python3.9/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/home/zjlwsl/anaconda3/lib/python3.9/site-packages/mitsuba/python/ad/integrators/prbvolpath.py", line 6, in from .common import RBIntegrator, mis_weight File "/home/zjlwsl/anaconda3/lib/python3.9/site-packages/mitsuba/python/ad/integrators/common.py", line 8, in class ADIntegrator(mi.CppADIntegrator): File "/home/zjlwsl/anaconda3/lib/python3.9/site-packages/mitsuba/init.py", line 253, in getattribute result = module.getattribute(key) File "/home/zjlwsl/anaconda3/lib/python3.9/site-packages/mitsuba/init.py", line 115, in getattribute raise AttributeError(e) AttributeError: jit_init_thread_state(): the CUDA backend hasn't been initialized. Make sure to call jit_init(JitBackend::CUDA) to properly initialize this backend.

Goo-JZhang commented 1 year ago

Same problem occurred when I install mitsuba via cmake.

njroussel commented 1 year ago

Hi @Goo-JZhang

Can you use drjit without Mitsuba? You can follow the example here. How about the Mitsuba command line? Can you render a scene?

I don't think we've seen something like this in a while.

Goo-JZhang commented 1 year ago

Hello, @njroussel , I've tried the example you give me.

  from drjit.cuda import Float, UInt32, Array3f, Array2f, TensorXf, Texture3f, PCG32, Loop  
  import drjit as dr  
  def sdf(p: Array3f) -> Float:  
      return dr.norm(p) - 1  
  sdf(Array3f(1, 2, 3))

outputs:

RuntimeError Traceback (most recent call last) File ~/dependency/mitsuba3/build/python/drjit/detail.py:280, in array_init(self, args) 279 for i in range(size): --> 280 self.setentry(i, value_type(args[i])) 281 elif self.IsMatrix and n == self.Size * self.Size:

RuntimeError: jit_init_thread_state(): the CUDA backend hasn't been initialized. Make sure to call jit_init(JitBackend::CUDA) to properly initialize this backend.

The above exception was the direct cause of the following exception:

TypeError Traceback (most recent call last) Cell In[5], line 5 3 def sdf(p: Array3f) -> Float: 4 return dr.norm(p) - 1 ----> 5 sdf(Array3f(1, 2, 3))

File ~/dependency/mitsuba3/build/python/drjit/detail.py:297, in array_init(self, args) 292 raise TypeError("%s constructor expects: arbitrarily many values " 293 "of type \"%s\", a matching list/tuple, or a NumPy/" 294 "PyTorch/TF/Jax array." % (type(self).name, 295 value_type.name)) from err 296 else: --> 297 raise TypeError("%s constructor expects: %s%i values " 298 "of type \"%s\", a matching list/tuple, or a NumPy/" 299 "PyTorch/TF/Jax array." % (type(self).name, "" if 300 size == 1 else "1 or ", size, 301 value_type.name)) from err

TypeError: Array3f constructor expects: 1 or 3 values of type "Float", a matching list/tuple, or a NumPy/PyTorch/TF/Jax array.

Goo-JZhang commented 1 year ago

Also I can produce the result with 'scalar_rgb', 'scalar_spectral', 'llvm_ad_rgb', using

import mitsuba as mi
mi.set_variant('scalar_rgb')
img = mi.render(mi.load_dict(mi.cornell_box()))
import matplotlib.pyplot as plt
plt.axis("off")
plt.imshow(img ** (1.0 / 2.2))

where 'scalar_rgb' can be replaced by any of 'scalar_spectral' and 'llvm_ad_rgb'

njroussel commented 1 year ago

Can you use other CUDA based tools in your environment? Something like Pytorch? Or is this exclusive to Mitsuba?

Goo-JZhang commented 1 year ago

@njroussel I can use tensorflow with cuda and I've used tensorflow to train a neuronetwork model in other course. Here's my command line output:

~/$ python Python 3.9.16 (main, Mar 8 2023, 14:00:05) [GCC 11.2.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf 2023-04-24 23:30:44.075361: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-04-24 23:30:46.061103: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT >>> tf.test.is_built_with_cuda() True

Goo-JZhang commented 1 year ago

Updated. I've found that I made a mistake when I added cuda path to environment variable. I could set_variant successfully but met another error when I run the code in here

RuntimeError Traceback (most recent call last) Cell In[2], line 4 2 import drjit as dr 3 mi.set_variant('cuda_ad_rgb') ----> 4 img = mi.render(mi.load_dict(mi.cornell_box())) 5 import matplotlib.pyplot as plt 6 plt.axis("off")

RuntimeError: Could not initialize OptiX!

It tells me that jit_optix_api_init(): Failed to load OptiX library! Very likely, your NVIDIA graphics driver is too old and not compatible with the version of OptiX that is being used. In particular, OptiX 7.4 requires driver revision R495.89 or newer. However my NVIDIA driver version is 527.92.01

zhaoguangyuan123 commented 1 year ago

It might be because of your NVCC version.

Type : nvcc -V to see the version.

Mine was 9.1 and it will result in bug. The bug vanish when I upgrade it to 12.0.

Best,

Goo-JZhang commented 1 year ago

@zhaoguangyuan123 I've update my nvidia driver in order to match my cuda version.

~$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.47                 Driver Version: 531.68       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1650         On | 00000000:01:00.0 Off |                  N/A |
| N/A   50C    P8                1W /  N/A|     96MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        20      G   /Xwayland                                 N/A      |
|    0   N/A  N/A      3719      C   /python3.9                                N/A      |
+---------------------------------------------------------------------------------------+
~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

zhaoguangyuan123 commented 1 year ago

@zhaoguangyuan123 I've update my nvidia driver in order to match my cuda version.

~$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.47                 Driver Version: 531.68       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1650         On | 00000000:01:00.0 Off |                  N/A |
| N/A   50C    P8                1W /  N/A|     96MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        20      G   /Xwayland                                 N/A      |
|    0   N/A  N/A      3719      C   /python3.9                                N/A      |
+---------------------------------------------------------------------------------------+
~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

I see.

njroussel commented 1 year ago

Did upgrading your drivers also work?

I've found that I made a mistake when I added cuda path to environment variable. I could set_variant successfully but met another error when I run the code in https://github.com/mitsuba-renderer/mitsuba3/issues/683#issuecomment-1520383046

What are you setting in the path exactly? I'm surprised because Mitsuba does not need a runtime installation of CUDA, it only needs your driver.

Goo-JZhang commented 1 year ago

@njroussel In my ~/.bashrc, I add

export PATH=/usr/local/cuda-12.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

If I comment the last line, the set_variant will give error about CUDA initialiazation.

I guess that some .so file in /usr/lib/wsl/lib links to the true driver in my Windows system, since a virtual machine doesn't need a true driver according to enable gpu on wsl2

After I update my driver, I can still use gpu in tensorflow but the error about mitsuba here isn't fixed.

Since the current error is about OptiX, I installed OptiX mannually and export it to my path,

export OptiX_INSTALL_DIR=/home/zjlwsl/dependency/NVIDIA-OptiX-SDK-7.7.0-linux64-x86_64
export LD_LIBRARY_PATH=$OptiX_INSTALL_DIR/SDK/build/lib:$LD_LIBRARY_PATH
export PATH=$OptiX_INSTALL_DIR/SDK/build/bin:$PATH

and uninstalled mitsuba3 installed by pip. Now I'm installing mitsuba3 mannually according to gpu-variants.

njroussel commented 1 year ago

Ohhhhh, I completely missed the fact that you were using WSL.

As far as I know, OptiX is not supported in WSL: https://forums.developer.nvidia.com/t/problem-running-optix-7-6-in-wsl/239355/2?u=njroussel

wjakob commented 1 year ago

Can we somehow detect WSL and emit a more useful error message? 🤔

njroussel commented 1 year ago

I looked into this a while ago, my understanding was that it's actually quite hard to determine if some process is running inside WSL. Some methods exists but they aren't fool proof. I didn't look any further, but if there's something that can work in most cases, that would be nice indeed.

Goo-JZhang commented 1 year ago

@njroussel I've done my last try by installing OptiX and mitsuba mannually,

from drjit.cuda import Float, UInt32, Array3f, Array2f, TensorXf, Texture3f, PCG32, Loop  
import drjit as dr  
def sdf(p: Array3f) -> Float:  
    return dr.norm(p) - 1  
sdf(Array3f(1, 2, 3))

could produce the correct. But the OptiX error is still there, hope that OptiX will give support to WSL in the near future. I would give up but do my project on my Windows environment. Thanks for your paying attention on this issue.

mitsuba-renderer / mitsuba3