Closed aidonchuk closed 4 years ago
same error and almost the same log~ compiled under CUDA 9.0
I use python3.6 and python3.7 with pytorch1.0.0, both failed. And I compiled under CUDA 9.0 and CUDA 8.0(the error of CUDA 8.0 and 9.0 are different), did you use @shrubb CUDA 10.0?
My environment is Debian 8(jessie), python3.7/3.6, CUDA 9.0/8.0, cuDNN 6/7, gcc 4.9.2
Thanks for reporting!
@alexdonchuk Looks like you have CUDA 9 or 9.1. These versions contain an NVCC bug due to which GCC 6 is unsupported with PyTorch (see pytorch/pytorch#8832). Your best option seems to be to update to CUDA 9.2 or 10. Or use GCC 5, though I'm not 100% sure if this will help.
@freesouls might be the same problem, or too old GCC, or too old kernel version for this GCC. See this for example. Anyway, I'd suggest to always update to latest possible software. Can I see your full log with CUDA 9?
Just in case: my configuration is Ubuntu 18.04.2, CUDA 9.2 and GCC 7.3.0.
Thx a lot. I'll try to upgrade.
Upgrade to cuda 10. Helps.
Upgrade to cuda 9.2 under gcc 4.9.2 also pass the test. My full log of cuda 9.0 is the same as @alexdonchuk
:tada: :balloon:
I use python3.5 with pytorch 1.0.1, and I compiled under cuda9.2 using gcc 5.4.0. I have the same errors and logs, why?
@dontLoveBugs Do you use Anaconda and/or have multiple CUDA versions installed? If yes, use CUDA_HOME
to point the installer to your CUDA that's used by PyTorch:
CUDA_HOME=/usr/local/cuda-9.2 python3 -m pip install .
Also, see #9.
@dontLoveBugs Do you use Anaconda and/or have multiple CUDA versions installed? If yes, use
CUDA_HOME
to point the installer to your CUDA that's used by PyTorch:CUDA_HOME=/usr/local/cuda-9.2 python3 -m pip install .
Also, see #9.
@shrubb Thx. When I used the configuration(Ubuntu 16.04, pytorch 1.0.0, cuda 9.2 and gcc 7.4) , I complied it and built it successfully. However, I tested the example code and I get the ModuleNotFoundError. I don't know why.
@dontLoveBugs It's too difficult to reason without full commands and logs. All I can say is that you may be accidentally using a different Python or have library paths messed up. Usually one safe way is to use a virtual Python environment
@shrubb I took your advice and compiled it again. But I still had the ModuleNotFoundError. I checked the site-packages and finded it only had one file and folder related to box_conv(box_convolution_cpp_cuda.cpython-36m-x86_64-linux-gnu.so、box_convolution-0.0.0.dist-info) and don't have the lib folder(I think it should be named as "box_convolution"). Why?
@dontLoveBugs Wow. Indeed, I made a very stupid mistake: the setup script didn't copy Python files. Glad you pointed it out. Better late than never :)
Please do a git pull origin master
and tell me if that worked.
@dontLoveBugs Wow. Indeed, I made a very stupid mistake: the setup script didn't copy Python files. Glad you pointed it out. Better late than never :) Please do a
git pull origin master
and tell me if that worked.
Thanks, I understand.
Hey guys,
I'm also facing problems with the installation. Here's a part of the error I'm getting:
src/box_convolution_cuda_backward.cu:195:309: required from here
src/box_convolution_cuda_backward.cu:176:1420: internal compiler error: in tsubst_copy, at cp/pt.c:13189
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
error: command 'gcc' failed with exit status 1
----------------------------------------
Command "/home/user/anaconda3/envs/tf/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-gacbfn7y/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-ti9f96kg/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-gacbfn7y/
I am using python 3.6, anaconda, pytorch 1.0.1.post2, CUDA 9.2, GCC 5.4
@Flock1 Reproduced this with GCC 5.5.0 too. Thanks for reporting. Nice, we've found a bug in GCC 5.
While I'll try to push a workaround, you can try other compiler. For example. if you have g++-7
, run
CC=g++-7 python3 -m pip install .
Same error:
error: command 'g++-7' failed with exit status 1
@Flock1 Can you post the full log at e.g. https://gist.github.com/ or https://pastebin.com/ ? Can be obtained with
CC=g++-7 python3 -m pip install . --log log.txt
@shrubb, will do that
@Flock1 That's very weird. I can't believe there is just "error" with absolutely no hint for the source of error. What if you run it again but with -v -v -v
arguments after pip
?
@shrubb, I'll send the result soon
@shrubb, this is what I get when I try to import:
from box_convolution import BoxConv2d
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sarvagya/Desktop/RBC/box-convolutions/box_convolution/__init__.py", line 1, in <module>
from .box_convolution_module import BoxConv2d
File "/home/sarvagya/Desktop/RBC/box-convolutions/box_convolution/box_convolution_module.py", line 4, in <module>
from .box_convolution_function import BoxConvolutionFunction, reparametrize
File "/home/sarvagya/Desktop/RBC/box-convolutions/box_convolution/box_convolution_function.py", line 3, in <module>
import box_convolution_cpp_cuda as cpp_cuda
ModuleNotFoundError: No module named 'box_convolution_cpp_cuda'
@shrubb, so I should try with GCC 7?
@shrubb, here's the log with g++ 7 https://gist.github.com/Flock1/752cd000570deb48d75c8544f4aa4257
@Flock1 Apparently, CUDA compiler didn't pick up g++-7. Just pushed a fix for that. Could you please try the latest version, i.e. python3 -m pip install git+https://github.com/shrubb/box-convolutions.git
?
@shrubb, will do. Give me some time.
@Flock1 you forgot to prepend CC=g++-7
@shrubb, the package is downloaded. Thanks.
Now, when I tried importing it, I got the following error:
ImportError: /home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/box_convolution_cpp_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration
The full error is:
>>> import box_convolution
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sarvagya/Desktop/RBC/box-convolutions/box_convolution/__init__.py", line 1, in <module>
from .box_convolution_module import BoxConv2d
File "/home/sarvagya/Desktop/RBC/box-convolutions/box_convolution/box_convolution_module.py", line 4, in <module>
from .box_convolution_function import BoxConvolutionFunction, reparametrize
File "/home/sarvagya/Desktop/RBC/box-convolutions/box_convolution/box_convolution_function.py", line 3, in <module>
import box_convolution_cpp_cuda as cpp_cuda
ImportError: /home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/box_convolution_cpp_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration
This was solved in #9, see solution there. This is likely because you're using Anaconda which messes paths up.
@shrubb, so is there some way I can use box-convolution through conda environment?
Yes, please read #9, someone else also had your setup and they succeeded, I posted a solution there, should help.
@shrubb, so the problem I'm facing is that my machine has CUDA 9.2 but torch installs 9.0 with it. Apparently, the CUDA installed by torch isn't very compatible with box_convolution. Or probably that's what's happening. I have attached the log. Let me know what you think.
https://gist.github.com/Flock1/0d12d133157eb326146736dfa5f1c3e0
@Flock1 PyTorch doesn't install CUDA for you -- you had two CUDAs before. One of them is probably from Anaconda.
What does this command output?
ldd `python3 -c "import torch, os; print(os.path.dirname(torch.__file__))"`/lib/libtorch.so | grep libcu
libcudart-f7fdd8d7.so.9.0 => /home/sarvagya/anaconda3/lib/python3.6/site-packages/torch/lib/libcudart-f7fdd8d7.so.9.0 (0x00007fcf1750b000)
Also, this is the output for nvcc --version
:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148
Hmmm. So your PyTorch is still pre-built with not-fully-compatible CUDA 9.0. Then, your options may be:
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
,I have downgraded to CUDA 9.0 which is the same as pytorch. I ran the command above to install and the following: https://gist.github.com/Flock1/d12454e46d8db875a5c8adc7c7495d1e
Aaargh, these compatibilities are annoying... Sorry for them.
Then for now the downgrading option is out. I'll still try to modify the code to work around the GCC 5 bug.
Haha :P, I can understand bro. I'll try other options as well. Let me know when you're done with updates.
@shrubb, I installed CUDA 10 and then it was downloaded. So apparently, box convolution won't work well with CUDA 9
Hi, I try to install this on windows 10 with anaconda and CUDA 10. I installed Windows 10 SDK (10.0.17763.0) couse pytorch require c++ compiler, but the installation always ended with error of cl.exe with error code 2. Is there anybody with windows experience who can help me? I read threads about this but nothing usefull :S
@illestom Could you post the full output of python3 -m pip install -v -v -v git+https://github.com/shrubb/box-convolutions.git
?
@shrubb It's very long error message so i copypate the end of it.. i give the whole if needed `........ data src\box_convolution_interface.cpp(367): error C2146: syntax error: missing ';' before identifier 'or' src\box_convolution_interface.cpp(367): error C2065: 'or': undeclared identifier src\box_convolution_interface.cpp(367): error C2146: syntax error: missing ';' before identifier 'paramId' src\box_convolution_interface.cpp(367): warning C4553: '==': result of expression not used; did you intend '='? src\box_convolution_interface.cpp(368): error C2065: 'not': undeclared identifier src\box_convolution_interface.cpp(368): error C2146: syntax error: missing ';' before identifier 'needXDeriv' src\box_convolution_interface.cpp(378): error C2146: syntax error: missing ';' before identifier 'or' src\box_convolution_interface.cpp(378): error C2065: 'or': undeclared identifier src\box_convolution_interface.cpp(378): error C2146: syntax error: missing ';' before identifier 'paramId' src\box_convolution_interface.cpp(378): warning C4553: '==': result of expression not used; did you intend '='? src\box_convolution_interface.cpp(388): error C2146: syntax error: missing ';' before identifier 'or' src\box_convolution_interface.cpp(388): error C2065: 'or': undeclared identifier src\box_convolution_interface.cpp(388): error C2146: syntax error: missing ';' before identifier 'paramId' error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe' failed with exit status 2
----------------------------------------
Command "C:\Users\Tamas\AppData\Local\conda\conda\envs\new\python.exe -u -c "import setuptools, tokenize;file='C:\Users\Tamas\AppData\Local\Temp\pip-req-build-hs215zk\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\Tamas\AppData\Local\Temp\pip-record-bmx7wd6b\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\Tamas\AppData\Local\Temp\pip-req-build-hs215zk`
@illestom Hm, yes, the important parts are missing there. I'd be happy if you provided the full output.
May I ask you to run
python3 -m pip install -v -v -v git+https://github.com/shrubb/box-convolutions.git --log log.txt
and then upload the log.txt
file to either http://gist.github.com or http://pastebin.com?
@illestom Just pushed a fix, did it help?
@shrubb the previous problem is gone, but now there is some CUDA problem, i guess :S https://drive.google.com/file/d/1c1N_dnK3huI9MnX0G2v7T5-U1gT7J9v8/view?usp=sharing
Hi! Can't compile pls see log https://drive.google.com/open?id=1U_0axWSgQGsvvdMWv5FclS1hHHihqx9M
Command "/home/alex/anaconda3/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-req-build-n1eyvbz3/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-record-p0dv1roq/install-record.txt --single-version-externally-managed --compile --user --prefix=" failed with error code 1 in /tmp/pip-req-build-n1eyvbz3/