sunset1995 / DirectVoxGO

Direct voxel grid optimization for fast radiance field reconstruction.
https://sunset1995.github.io/dvgo
Other
1.05k stars 110 forks source link

Building CUDA Extension #24

Open greeneggsandyaml opened 2 years ago

greeneggsandyaml commented 2 years ago

Hello authors,

Thank you for your great work and your code. I am trying to run the model, which involves building the CUDA extension. I am aware that issue #13 exists, but it does not provide information that solves my issue, so I am opening a new issue.

I am using CUDA 11.6 with PyTorch built for CUDA 11.6. I have successfully installed pytorch-scatter and all the dependencies in requirements.txt. When I run python run.py --config configs/nerf/lego.py --render_test, I get:

>>> python run.py --config configs/nerf/lego.py --render_test

Using /tmp/torch-ext as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /tmp/torch-ext/adam_upd_cuda/build.ninja...
Building extension module adam_upd_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] /usr/local/cuda/bin/nvcc  -ccbin /software/compilers/gcc-5.4.0/bin/gcc -DTORCH_EXTENSION_NAME=adam_upd_cuda -DTORCH_API_INCLUDE_EXTENSI
ON_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -isystem /users/user/miniconda
3/envs/new2/lib/python3.9/site-packages/torch/include -isystem /users/user/miniconda3/envs/new2/lib/python3.9/site-packages/torch/include/t
orch/csrc/api/include -isystem /users/user/miniconda3/envs/new2/lib/python3.9/site-packages/torch/include/TH -isystem /users/user/minicon
da3/envs/new2/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /users/user/miniconda3/envs/new2/incl
ude/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__C
UDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-opt
ions '-fPIC' -std=c++14 -c /users/user/projects/experiments/active/DirectVoxGO/lib/cuda/adam_upd_kernel.cu -o adam_upd_kernel.cuda.o
FAILED: adam_upd_kernel.cuda.o
/usr/local/cuda/bin/nvcc  -ccbin /software/compilers/gcc-5.4.0/bin/gcc -DTORCH_EXTENSION_NAME=adam_upd_cuda -DTORCH_API_INCLUDE_EXTENSION_H -
DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -isystem /users/user/miniconda3/envs
/new2/lib/python3.9/site-packages/torch/include -isystem /users/user/miniconda3/envs/new2/lib/python3.9/site-packages/torch/include/torch/c
src/api/include -isystem /users/user/miniconda3/envs/new2/lib/python3.9/site-packages/torch/include/TH -isystem /users/user/miniconda3/en
vs/new2/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /users/user/miniconda3/envs/new2/include/py
thon3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO
_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '
-fPIC' -std=c++14 -c /users/user/projects/experiments/active/DirectVoxGO/lib/cuda/adam_upd_kernel.cu -o adam_upd_kernel.cuda.o
/users/user/projects/experiments/active/DirectVoxGO/lib/cuda/adam_upd_kernel.cu: In lambda function:
/users/user/projects/experiments/active/DirectVoxGO/lib/cuda/adam_upd_kernel.cu:74:116: warning: ‘T* at::Tensor::data() const [with T = dou
ble]’ is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
/users/user/miniconda3/envs/new2/lib/python3.9/site-packages/torch/include/ATen/core/TensorBody.h:236:1: note: declared here
   T * data() const {
 ^
/users/user/projects/experiments/active/DirectVoxGO/lib/cuda/adam_upd_kernel.cu:74:141: warning: ‘T* at::Tensor::data() const [with T = dou
ble]’ is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
/users/user/miniconda3/envs/new2/lib/python3.9/site-packages/torch/include/ATen/core/TensorBody.h:236:1: note: declared here
   T * data() const {

...

 ^
/users/user/miniconda3/envs/new2/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of
 ‘std::shared_ptr<torch::nn::Module> torch::nn::Cloneable<Derived>::clone(const c10::optional<c10::Device>&) const [with Derived = torch::nn:
:CrossMapLRN2dImpl]’:
/tmp/tmpxft_0000447a_00000000-6_adam_upd_kernel.cudafe1.stub.c:59:27:   required from here
/users/user/miniconda3/envs/new2/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:58:59: error: invali
d static_cast from type ‘const torch::OrderedDict<std::basic_string<char>, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string<char>,
at::Tensor>&’
/users/user/miniconda3/envs/new2/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:71:61: error: invali
d static_cast from type ‘const torch::OrderedDict<std::basic_string<char>, std::shared_ptr<torch::nn::Module> >’ to type ‘torch::OrderedDict<
std::basic_string<char>, std::shared_ptr<torch::nn::Module> >&’
/users/user/miniconda3/envs/new2/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of
 ‘std::shared_ptr<torch::nn::Module> torch::nn::Cloneable<Derived>::clone(const c10::optional<c10::Device>&) const [with Derived = torch::nn:
:EmbeddingBagImpl]’:

...

/users/user/miniconda3/envs/new2/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:58:59: error: invali
d static_cast from type ‘const torch::OrderedDict<std::basic_string<char>, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string<char>,
at::Tensor>&’
/users/user/miniconda3/envs/new2/lib/python3.9/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:71:61: error: invali
d static_cast from type ‘const torch::OrderedDict<std::basic_string<char>, std::shared_ptr<torch::nn::Module> >’ to type ‘torch::OrderedDict<
std::basic_string<char>, std::shared_ptr<torch::nn::Module> >&’
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/users/user/miniconda3/envs/new2/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1865, in _run_ninja_build
    subprocess.run(
  File "/users/user/miniconda3/envs/new2/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/users/user/projects/experiments/active/DirectVoxGO/run.py", line 13, in <module>
    from lib import utils, dvgo, dcvgo, dmpigo
  File "/users/user/projects/experiments/active/DirectVoxGO/lib/utils.py", line 11, in <module>
    from .masked_adam import MaskedAdam
  File "/users/user/projects/experiments/active/DirectVoxGO/lib/masked_adam.py", line 7, in <module>
    adam_upd_cuda = load(
  File "/users/user/miniconda3/envs/new2/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1257, in load
    return _jit_compile(
  File "/users/user/miniconda3/envs/new2/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1480, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/users/user/miniconda3/envs/new2/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1594, in _write_ninja_file_and_bui
ld_library
    _run_ninja_build(
  File "/users/user/miniconda3/envs/new2/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1881, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'adam_upd_cuda'

Your assistance on this issue would be greatly appreciated! Thank you again.

greeneggsandyaml commented 2 years ago

Hello, I just wanted to follow up on this and give an update. I have tried with CUDA 11.1, with matching versions of CUDA, PyTorch, and pytorch-scatter. I still get the same error.

sunset1995 commented 2 years ago

Hmm strange. My machine is torch==1.10.1+cu111 too and it works well. Could you please provide more detail about your nvcc version and which version of the DVGO do you use?

Besides, have you do any modification to the c++/cuda codes? It's strange that the backend functions use OrderedDict to access torch::Tensor.

greeneggsandyaml commented 2 years ago

Thanks for the response!

I am using the main branch of DVGO (I just cloned, installed the dependencies, and tried to run run.py). I have not modified the code at all. Here is the result of nvcc -V:

>>> nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

I'm using Python 3.9 and an NVIDIA A40. What GPU/Python/versions are you using?

Thanks so much for your help!

sunset1995 commented 2 years ago

On my side, my nvcc -V is

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0

I'm using Python 3.9, torch==1.10.1+cu111. My GPU is RTX 2080Ti with CUDA 11.1.

sunset1995 commented 2 years ago

I found a similar issue in other repos. It seems that gcc version matters too. My gcc --version is

gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
greeneggsandyaml commented 2 years ago

Thanks for the investigation! I'm using gcc 5.4.0, like in the issue that you linked. I will install a higher version of gcc and try again!

supdhn commented 2 years ago

Hi everyone I am having the same issue and I haven't figured it out why ? any help ?

I installed pytorch through this command:

pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

gcc version on my computer:

gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Copyright (C) 2017 Free Software Foundation, Inc.

g++ version on my computer:

g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Copyright (C) 2017 Free Software Foundation, Inc.

and nvcc version is

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_Mar__8_18:36:24_Pacific_Standard_Time_2022 Cuda compilation tools, release 11.6, V11.6.124 Build cuda_11.6.r11.6/compiler.31057947_0

Any ideas on how to solve this ? I am on a Ubuntu 18.04 with 16gb memory and RTX Tian. Using python 3.7.5.
Any help is appreciated !

P.S.: I have tried with cuda versions 11.7, 11.6 and 11.3 and they all give me the sames errors

thanks in advance

j93hahn commented 2 years ago

Has anyone received a message like #error -- unsupported GNU version! gcc versions later than 8 are not supported!

I have gcc-12.1.0 installed, is this too new? I'm not entirely sure how to install a gcc version like 7.5.0 as it does not seem to be supported on any conda channels.

RedemptYourself commented 1 year ago

did it work? it seems not

aztsdfhj commented 1 year ago

Thanks for the investigation! I'm using gcc 5.4.0, like in the issue that you linked. I will install a higher version of gcc and try again!

Did you finally solve it? I had the same problem as you