nerfstudio-project / nerfacc

A General NeRF Acceleration Toolbox in PyTorch.
https://www.nerfacc.com/
Other
1.38k stars 112 forks source link

[probably] Infinite cuda extention build #116

Closed kst179 closed 1 year ago

kst179 commented 1 year ago

Hi, first of all: that's a great work, thank you for that. But I've had some problems on the first launch.

I've tried to run nerfstudio with instant-ngp model, and faced the [probably] infinite extention build when nerfacc's C++ functions are called. At least compilation did not finished successfully for about 10 minutes or even more (yeah, I am kinda impatient).

minimal example:

>>> import nerfacc
>>> nerfacc.ContractionType.AABB.to_cpp_version()
# Out: (<infinitly bouncing ball>) NerfAcc: Setting up CUDA (This may take a few minutes the first time)

I've tried to build it manualy, by just calling (I suppose build config was already generated by torch.utils.cpp_extention.load after first call)

$ cd ~/.cache/torch_extensions/py38_cu116/nerfacc_cuda && ninja -j 4

and voila, it took about 1 minute to compile (and it is showing more detailed progress :upside_down_face:). If not specify num workers it is killed because consumes all memory on my laptop and wants even more.

Hope it will help others, who facing the same problem and maybe you can fix it somehow.

nerfacc version: 0.2.1 (last compatible with nerfstudio)
nvcc version: 11.6
torch version: 1.12.1 (+cu116)
os: ubuntu 20.04.4 LTS
memory: 15.3 Gb
processor: core i7 @ 3.30GHz x8
gpu: RTX 3050 Laptop
Zeju1997 commented 1 year ago

Thank you so much for sharing. :D

liruilong940607 commented 1 year ago

Interesting! Thanks for sharing. I'll find some time to look into this

theNded commented 1 year ago

It also happens from my side. Here are my observations:

  1. I had nerfacc configured in a testing environment with python3.9.
  2. Later I configured another environment for nerfstudio with python3.8, depending on nerfacc.
  3. The infinite loop happened when I was playing with nerfstudio, i.e., the python3.8 environment.
  4. I checked .cache and nerfacc only appeared in py39_cu113. Nothing was found in py38_cu133.
  5. After deleting all the cache, the compilation was completed for python3.8 correctly. Therefore I suspect the issue comes from the upstream torch's jit module.
liruilong940607 commented 1 year ago

It should be fixed. Please reopen it if the problem still exists