skhu101 / SHERF

Code for our ICCV'2023 paper "SHERF: Generalizable Human NeRF from a Single Image"
Other
297 stars 10 forks source link

Error when running eval_THUman #31

Closed TonNew5418 closed 6 months ago

TonNew5418 commented 7 months ago

Using the following commands to setup environment: conda create --name sherf python=3.8 conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch conda install -c fvcore -c iopath -c conda-forge fvcore iopath pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu113_pyt1110/download.html) pip install -r requirements.txt conda activate sherf And nvcc -V is 11.3, torch.cuda.is_available() is True and from pytorch3d import _C is correct. But got error: Traceback (most recent call last): File "train.py", line 446, in main() # pylint: disable=no-value-for-parameter File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/click/core.py", line 1130, in call return self.main(args, kwargs) File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/click/core.py", line 760, in invoke return __callback(args, kwargs) File "train.py", line 441, in main launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run) File "train.py", line 101, in launch_training subprocess_fn(rank=0, c=c, temp_dir=temp_dir) File "train.py", line 52, in subprocess_fn training_loop.training_loop(rank=rank, c) File "/home/jianl0b/SHERF-main/sherf/training/training_loop.py", line 323, in training_loop test(G, savedir=testsavedir, neural_rendering_resolution=loss_kwargs['neural_rendering_resolution_initial'], rank=0, use_sr_module=use_sr_module, white_back=False, sample_obs_view=training_set_kwargs.sample_obs_view, fix_obs_view=training_set_kwargs.fix_obs_view, dataset_name=cfg, data_root=training_set_kwargs.data_root, obs_view_lst=[4, 12, 20], nv_pose_start=0, np_pose_start=0, pose_interval=2, pose_num=5) File "/home/jianl0b/SHERF-main/sherf/training/test_loop.py", line 189, in test gen_img = model(test_data, torch.randn(1, 512).to(device), torch.zeros((1, 25)).to(device), \ File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, kwargs) File "/home/jianl0b/SHERF-main/sherf/training/triplane.py", line 235, in forward ws = self.mapping(z, c, input_img=input_img, truncation_psi=truncation_psi, truncation_cutoff=truncation_cutoff, update_emas=update_emas) File "/home/jianl0b/SHERF-main/sherf/training/triplane.py", line 79, in mapping return self.backbone.mapping(z, c self.rendering_kwargs.get('c_scale', 0), truncation_psi=truncation_psi, truncation_cutoff=truncation_cutoff, update_emas=update_emas) File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "/home/jianl0b/SHERF-main/sherf/training/networks_stylegan2.py", line 248, in forward x = layer(x) File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/jianl0b/SHERF-main/sherf/training/networks_stylegan2.py", line 126, in forward x = bias_act.bias_act(x, b, act=self.activation) File "/home/jianl0b/SHERF-main/sherf/torch_utils/ops/bias_act.py", line 86, in bias_act if impl == 'cuda' and x.device.type == 'cuda' and _init(): File "/home/jianl0b/SHERF-main/sherf/torch_utils/ops/bias_act.py", line 43, in _init _plugin = custom_ops.get_plugin( File "/home/jianl0b/SHERF-main/sherf/torch_utils/custom_ops.py", line 138, in get_plugin torch.utils.cpp_extension.load(name=module_name, build_directory=cached_build_dir, File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1144, in load return _jit_compile( File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1357, in _jit_compile _write_ninja_file_and_build_library( File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1469, in _write_ninja_file_and_build_library _run_ninja_build( File "/home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1756, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'bias_act_plugin': [1/3] :/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/TH -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda/include -isystem /home/jianl0b/anaconda3/envs/sherf/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' --use_fast_math -std=c++14 -c /home/jianl0b/.cache/torch_extensions/py38_cu113/bias_act_plugin/b46266ff65f9fa53c32108953a1c6f16-nvidia-rtx-a4500/bias_act.cu -o bias_act.cuda.o FAILED: bias_act.cuda.o :/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/TH -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda/include -isystem /home/jianl0b/anaconda3/envs/sherf/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' --use_fast_math -std=c++14 -c /home/jianl0b/.cache/torch_extensions/py38_cu113/bias_act_plugin/b46266ff65f9fa53c32108953a1c6f16-nvidia-rtx-a4500/bias_act.cu -o bias_act.cuda.o /bin/sh: :/usr/local/cuda/bin/nvcc: No such file or directory [2/3] c++ -MMD -MF bias_act.o.d -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/TH -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda/include -isystem /home/jianl0b/anaconda3/envs/sherf/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/jianl0b/.cache/torch_extensions/py38_cu113/bias_act_plugin/b46266ff65f9fa53c32108953a1c6f16-nvidia-rtx-a4500/bias_act.cpp -o bias_act.o FAILED: bias_act.o c++ -MMD -MF bias_act.o.d -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/TH -isystem /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/THC -isystem :/usr/local/cuda/include -isystem /home/jianl0b/anaconda3/envs/sherf/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/jianl0b/.cache/torch_extensions/py38_cu113/bias_act_plugin/b46266ff65f9fa53c32108953a1c6f16-nvidia-rtx-a4500/bias_act.cpp -o bias_act.o In file included from /home/jianl0b/.cache/torch_extensions/py38_cu113/bias_act_plugin/b46266ff65f9fa53c32108953a1c6f16-nvidia-rtx-a4500/bias_act.cpp:14: /home/jianl0b/anaconda3/envs/sherf/lib/python3.8/site-packages/torch/include/ATen/cuda/CUDAContext.h:5:10: fatal error: cuda_runtime_api.h: No such file or directory 5 | #include | ^~~~~~~~ compilation terminated. ninja: build stopped: subcommand failed.

TonNew5418 commented 7 months ago

Solved. This problem was caused by 3 subproblems for me:

  1. https://github.com/NVlabs/stylegan3/issues/165 This link is the first problem. I change the line config.append(f"nvcc = {nvcc}") to config.append(f"nvcc = {nvcc[1:]}") and the line command = ['ninja', '-v'] to command = ['ninja', '--verbose']
  2. The lock of cached file. I delete the folder /home/<user_name>/.cache/torch_extensions/py38_cu113/bias_act_plugin/ My reference: https://blog.csdn.net/qq_38677322/article/details/109696077
  3. /usr/bin/ld: cannot find -lcudart collect2: error: ld returned 1 exit status This is because the lack of libcudart.so. I solved it by adding a soft link.