icon-lab / SynDiff

Official PyTorch implementation of SynDiff described in the paper (https://arxiv.org/abs/2207.08208).
Other
229 stars 39 forks source link

Error while running train.py function #11

Closed aartykov closed 1 year ago

aartykov commented 1 year ago

Hello, I am getting the following error while running the "train.py" function. I guess, the error is related to ninja but I already installed the appropriate ninja library. Help me, please, handle this problem.

Traceback (most recent call last): File "/home/arslan/miniconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build subprocess.run( File "/home/arslan/miniconda3/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/arslan/SynDiff/train_custom.py", line 861, in init_processes(0, size, train_syndiff, args) File "/home/arslan/SynDiff/train_custom.py", line 724, in init_processes fn(rank, gpu, args) File "/home/arslan/SynDiff/train_custom.py", line 188, in train_syndiff from backbones.discriminator import Discriminator_small, Discriminator_large File "/home/arslan/SynDiff/backbones/discriminator.py", line 12, in from . import up_or_down_sampling File "/home/arslan/SynDiff/backbones/up_or_down_sampling.py", line 15, in from utils.op import upfirdn2d File "/home/arslan/SynDiff/utils/op/init.py", line 1, in from .fused_act import FusedLeakyReLU, fused_leaky_relu File "/home/arslan/SynDiff/utils/op/fused_act.py", line 20, in fused = load( File "/home/arslan/miniconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load return _jit_compile( File "/home/arslan/miniconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile _write_ninja_file_and_build_library( File "/home/arslan/miniconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library _run_ninja_build( File "/home/arslan/miniconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'fused': [1/1] c++ fused_bias_act.o fused_bias_act_kernel.cuda.o -shared -L/home/arslan/miniconda3/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/home/arslan/miniconda3/lib64 -lcudart -o fused.so FAILED: fused.so c++ fused_bias_act.o fused_bias_act_kernel.cuda.o -shared -L/home/arslan/miniconda3/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/home/arslan/miniconda3/lib64 -lcudart -o fused.so /usr/bin/ld: cannot find -lcudart: No such file or directory collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed.

whisney commented 1 year ago

I met a similar error:

module_path = /home/zyw/Diffusion_model/SynDiff-main/utils/op Traceback (most recent call last): File "/home/zyw/zywenvi/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1723, in _run_ninja_build env=env) File "/home/zyw/zywenvi/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "train.py", line 861, in init_processes(0, size, train_syndiff, args) File "train.py", line 724, in init_processes fn(rank, gpu, args) File "train.py", line 188, in train_syndiff from backbones.discriminator import Discriminator_small, Discriminator_large File "/home/zyw/Diffusion_model/SynDiff-main/backbones/discriminator.py", line 12, in from . import up_or_down_sampling File "/home/zyw/Diffusion_model/SynDiff-main/backbones/up_or_down_sampling.py", line 15, in from utils.op import upfirdn2d File "/home/zyw/Diffusion_model/SynDiff-main/utils/op/init.py", line 1, in from .fused_act import FusedLeakyReLU, fused_leaky_relu File "/home/zyw/Diffusion_model/SynDiff-main/utils/op/fused_act.py", line 24, in os.path.join(module_path, "fused_bias_act_kernel.cu"), File "/home/zyw/zywenvi/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1136, in load keep_intermediates=keep_intermediates) File "/home/zyw/zywenvi/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1347, in _jit_compile is_standalone=is_standalone) File "/home/zyw/zywenvi/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1452, in _write_ninja_file_and_build_library error_prefix=f"Error building extension '{name}'") File "/home/zyw/zywenvi/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'fused': [1/2] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/zyw/zywenvi/lib/python3.6/site-packages/torch/include -isystem /home/zyw/zywenvi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/zyw/zywenvi/lib/python3.6/site-packages/torch/include/TH -isystem /home/zyw/zywenvi/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/zyw/zywenvi/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/zyw/Diffusion_model/SynDiff-main/utils/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o FAILED: fused_bias_act_kernel.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/zyw/zywenvi/lib/python3.6/site-packages/torch/include -isystem /home/zyw/zywenvi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/zyw/zywenvi/lib/python3.6/site-packages/torch/include/TH -isystem /home/zyw/zywenvi/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/zyw/zywenvi/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/zyw/Diffusion_model/SynDiff-main/utils/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o /usr/include/c++/7/bits/basic_string.tcc: In instantiation of ‘static std::basic_string<_CharT, _Traits, _Alloc>::_Rep std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char16_t; _Traits = std::char_traits; _Alloc = std::allocator; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’: /usr/include/c++/7/bits/basic_string.tcc:578:28: required from ‘static _CharT std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char16_t; _CharT = char16_t; _Traits = std::char_traits; _Alloc = std::allocator]’ /usr/include/c++/7/bits/basic_string.h:5042:20: required from ‘static _CharT std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::false_type) [with _InIterator = const char16_t; _CharT = char16_t; _Traits = std::char_traits; _Alloc = std::allocator]’ /usr/include/c++/7/bits/basic_string.h:5063:24: required from ‘static _CharT std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char16_t; _CharT = char16_t; _Traits = std::char_traits; _Alloc = std::allocator]’ /usr/include/c++/7/bits/basic_string.tcc:656:134: required from ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char16_t; _Traits = std::char_traits; _Alloc = std::allocator; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’ /usr/include/c++/7/bits/basic_string.h:6688:95: required from here /usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char16_t; _Traits = std::char_traits; _Alloc = std::allocator]’ without object p->_M_set_sharable();


/usr/include/c++/7/bits/basic_string.tcc: In instantiation of ‘static std::basic_string<_CharT, _Traits, _Alloc>::_Rep* std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’:
/usr/include/c++/7/bits/basic_string.tcc:578:28:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.h:5042:20:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::__false_type) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.h:5063:24:   required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.tcc:656:134:   required from ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’
/usr/include/c++/7/bits/basic_string.h:6693:95:   required from here
/usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’ without object
ninja: build stopped: subcommand failed.
whisney commented 1 year ago

Hello, I am getting the following error while running the "train.py" function. I guess, the error is related to ninja but I already installed the appropriate ninja library. Help me, please, handle this problem.

Traceback (most recent call last): File "/home/arslan/miniconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build subprocess.run( File "/home/arslan/miniconda3/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/arslan/SynDiff/train_custom.py", line 861, in init_processes(0, size, train_syndiff, args) File "/home/arslan/SynDiff/train_custom.py", line 724, in init_processes fn(rank, gpu, args) File "/home/arslan/SynDiff/train_custom.py", line 188, in train_syndiff from backbones.discriminator import Discriminator_small, Discriminator_large File "/home/arslan/SynDiff/backbones/discriminator.py", line 12, in from . import up_or_down_sampling File "/home/arslan/SynDiff/backbones/up_or_down_sampling.py", line 15, in from utils.op import upfirdn2d File "/home/arslan/SynDiff/utils/op/init.py", line 1, in from .fused_act import FusedLeakyReLU, fused_leaky_relu File "/home/arslan/SynDiff/utils/op/fused_act.py", line 20, in fused = load( File "/home/arslan/miniconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load return _jit_compile( File "/home/arslan/miniconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile _write_ninja_file_and_build_library( File "/home/arslan/miniconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library _run_ninja_build( File "/home/arslan/miniconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'fused': [1/1] c++ fused_bias_act.o fused_bias_act_kernel.cuda.o -shared -L/home/arslan/miniconda3/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/home/arslan/miniconda3/lib64 -lcudart -o fused.so FAILED: fused.so c++ fused_bias_act.o fused_bias_act_kernel.cuda.o -shared -L/home/arslan/miniconda3/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/home/arslan/miniconda3/lib64 -lcudart -o fused.so /usr/bin/ld: cannot find -lcudart: No such file or directory collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed.

I found a suboptimal solution: https://blog.csdn.net/weixin_44616294/article/details/124150565 Error occurred when loading cpp file. If the "upfirdn2d" function is directly rewritten with pytorch, there is no need to load additional modules of cpp.

onat-dalmaz commented 1 year ago

"lcudart "library is a part of the CUDA toolkit, which is required for building CUDA-based PyTorch extensions. It seems that it is missing from your environment. To resolve the issue, you should check that you have installed the appropriate version of the CUDA toolkit that is compatible with the version of PyTorch you are using. Also, make sure that the CUDA toolkit is properly configured and the required paths are added to your system environment variables.

If you have already installed the CUDA toolkit and added the necessary paths, you may want to try uninstalling and reinstalling PyTorch to ensure that it is linked correctly with the CUDA toolkit. Additionally, you can try updating the version of PyTorch or CUDA toolkit to see if that resolves the issue. I hope these suggestions would work for you.