Closed jeffryWillam closed 2 years ago
Hi,
Installing with pip rather than conda should not be an issue. There might be a problem of mismatched versions or ninja failing to find your CUDA library path.
Could you please provide me with the full error stack?
Thanks for your help, the traceback is as follows (process.poll() in subprocess.py equals to 1 on torch1.7 but 0 on torch 1.4 )😀:
Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1533, in _run_ninja_build subprocess.run( File "/usr/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "
/usr/include/c++/7/bits/basic_string.tcc: In instantiation of ‘static std::basic_string<_CharT, _Traits, _Alloc>::_Rep* std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’:
/usr/include/c++/7/bits/basic_string.tcc:578:28: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&, std::forward_iterator_tag) [with _FwdIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.h:5042:20: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct_aux(_InIterator, _InIterator, const _Alloc&, std::__false_type) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.h:5063:24: required from ‘static _CharT* std::basic_string<_CharT, _Traits, _Alloc>::_S_construct(_InIterator, _InIterator, const _Alloc&) [with _InIterator = const char32_t*; _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’
/usr/include/c++/7/bits/basic_string.tcc:656:134: required from ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string(const _CharT*, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc&) [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>; std::basic_string<_CharT, _Traits, _Alloc>::size_type = long unsigned int]’
/usr/include/c++/7/bits/basic_string.h:6693:95: required from here
/usr/include/c++/7/bits/basic_string.tcc:1067:16: error: cannot call member function ‘void std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_M_set_sharable() [with _CharT = char32_t; _Traits = std::char_traits<char32_t>; _Alloc = std::allocator<char32_t>]’ without object
ninja: build stopped: subcommand failed.
As far as I can see (e.g. from here and here), this seems to be a problem with specific CUDA 10.1 versions.
Things I'd try, depending on how much you're willing to mess with your CUDA installation or whether you'd rather solve it with code: 1) Upgrade to a newer version of CUDA 10.1 (seems to be fixed in 10.1.168). 2) Upgrade to CUDA 10.2 3) Work inside the docker we provide in the readme, it has all the relevant packages installed. 4) Replace the implementation of the sg2 model in ZSSGAN/model/ with the version you can find in the StyleCLIP repo. They use a modified, native-pytorch implementation which doesn't use any of the StyleGAN2 CUDA kernels. Things will run a bit (~15%) slower, but you won't be compiling any new operations and won't have this ninja issue.
Thanks for your suggestions! I will try it one by one 😀. Once resolved, I will push the report here.
Problem solved after updating the cuda 10.1 to 10.1.168. It works😀, thanks for your help!
Happy to help!
Closing as resolved. Feel free to open a new issue if you need additional help.
Hi, thanks for your excellent work! It is really instructive! however, I met a strange compiling problem while training the network. It reminds "subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1 while building extension 'fused'. This only happens on pytorch 1.7.1 and nothing happens using torch 1.4 (but clip models seem require torch1.7). I have got stuck with this for many days and still cannot find the solution 😀. Any suggestion for this? BTW, the torch 1.7 is installed using pip rather than coda, does this matter? Many thanks for the help 🤪. The environment I use is as listed: ubuntu 20.14 pytorch 1.7.1 torchvision 0.8.2 torchaudio 0.7.2 CUDA 10.1 ninja 1.8.2