shrubb / box-convolutions

PyTorch code for the "Deep Neural Networks with Box Convolutions" paper
Apache License 2.0
511 stars 35 forks source link

Build problem! #2

Closed aidonchuk closed 4 years ago

aidonchuk commented 5 years ago

Hi! Can't compile pls see log https://drive.google.com/open?id=1U_0axWSgQGsvvdMWv5FclS1hHHihqx9M

Command "/home/alex/anaconda3/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-req-build-n1eyvbz3/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-record-p0dv1roq/install-record.txt --single-version-externally-managed --compile --user --prefix=" failed with error code 1 in /tmp/pip-req-build-n1eyvbz3/

freesouls commented 5 years ago

same error and almost the same log~ compiled under CUDA 9.0

freesouls commented 5 years ago

I use python3.6 and python3.7 with pytorch1.0.0, both failed. And I compiled under CUDA 9.0 and CUDA 8.0(the error of CUDA 8.0 and 9.0 are different), did you use @shrubb CUDA 10.0?

My environment is Debian 8(jessie), python3.7/3.6, CUDA 9.0/8.0, cuDNN 6/7, gcc 4.9.2

shrubb commented 5 years ago

Thanks for reporting!

@alexdonchuk Looks like you have CUDA 9 or 9.1. These versions contain an NVCC bug due to which GCC 6 is unsupported with PyTorch (see pytorch/pytorch#8832). Your best option seems to be to update to CUDA 9.2 or 10. Or use GCC 5, though I'm not 100% sure if this will help.

@freesouls might be the same problem, or too old GCC, or too old kernel version for this GCC. See this for example. Anyway, I'd suggest to always update to latest possible software. Can I see your full log with CUDA 9?

Just in case: my configuration is Ubuntu 18.04.2, CUDA 9.2 and GCC 7.3.0.

aidonchuk commented 5 years ago

Thx a lot. I'll try to upgrade.

aidonchuk commented 5 years ago

Upgrade to cuda 10. Helps.

freesouls commented 5 years ago

Upgrade to cuda 9.2 under gcc 4.9.2 also pass the test. My full log of cuda 9.0 is the same as @alexdonchuk

shrubb commented 5 years ago

:tada: :balloon:

dontLoveBugs commented 5 years ago

I use python3.5 with pytorch 1.0.1, and I compiled under cuda9.2 using gcc 5.4.0. I have the same errors and logs, why?

shrubb commented 5 years ago

@dontLoveBugs Do you use Anaconda and/or have multiple CUDA versions installed? If yes, use CUDA_HOME to point the installer to your CUDA that's used by PyTorch:

CUDA_HOME=/usr/local/cuda-9.2 python3 -m pip install .

Also, see #9.

dontLoveBugs commented 5 years ago

@dontLoveBugs Do you use Anaconda and/or have multiple CUDA versions installed? If yes, use CUDA_HOME to point the installer to your CUDA that's used by PyTorch:

CUDA_HOME=/usr/local/cuda-9.2 python3 -m pip install .

Also, see #9.

@shrubb Thx. When I used the configuration(Ubuntu 16.04, pytorch 1.0.0, cuda 9.2 and gcc 7.4) , I complied it and built it successfully. However, I tested the example code and I get the ModuleNotFoundError. I don't know why. image image

shrubb commented 5 years ago

@dontLoveBugs It's too difficult to reason without full commands and logs. All I can say is that you may be accidentally using a different Python or have library paths messed up. Usually one safe way is to use a virtual Python environment

dontLoveBugs commented 5 years ago

@shrubb I took your advice and compiled it again. But I still had the ModuleNotFoundError. I checked the site-packages and finded it only had one file and folder related to box_conv(box_convolution_cpp_cuda.cpython-36m-x86_64-linux-gnu.so、box_convolution-0.0.0.dist-info) and don't have the lib folder(I think it should be named as "box_convolution"). Why? image

shrubb commented 5 years ago

@dontLoveBugs Wow. Indeed, I made a very stupid mistake: the setup script didn't copy Python files. Glad you pointed it out. Better late than never :) Please do a git pull origin master and tell me if that worked.

dontLoveBugs commented 5 years ago

@dontLoveBugs Wow. Indeed, I made a very stupid mistake: the setup script didn't copy Python files. Glad you pointed it out. Better late than never :) Please do a git pull origin master and tell me if that worked.

Thanks, I understand.

Flock1 commented 5 years ago

Hey guys,

I'm also facing problems with the installation. Here's a part of the error I'm getting:

src/box_convolution_cuda_backward.cu:195:309:   required from here
    src/box_convolution_cuda_backward.cu:176:1420: internal compiler error: in tsubst_copy, at cp/pt.c:13189
    Please submit a full bug report,
    with preprocessed source if appropriate.
    See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
    error: command 'gcc' failed with exit status 1

    ----------------------------------------
Command "/home/user/anaconda3/envs/tf/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-gacbfn7y/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-ti9f96kg/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-gacbfn7y/

I am using python 3.6, anaconda, pytorch 1.0.1.post2, CUDA 9.2, GCC 5.4

shrubb commented 5 years ago

@Flock1 Reproduced this with GCC 5.5.0 too. Thanks for reporting. Nice, we've found a bug in GCC 5.

While I'll try to push a workaround, you can try other compiler. For example. if you have g++-7, run

CC=g++-7 python3 -m pip install .
Flock1 commented 5 years ago

Same error:

error: command 'g++-7' failed with exit status 1
shrubb commented 5 years ago

@Flock1 Can you post the full log at e.g. https://gist.github.com/ or https://pastebin.com/ ? Can be obtained with

CC=g++-7 python3 -m pip install . --log log.txt
Flock1 commented 5 years ago

@shrubb, will do that

Flock1 commented 5 years ago

https://gist.github.com/Flock1/d03a6d2814099379780cfd7ff9b07ab3

shrubb commented 5 years ago

@Flock1 That's very weird. I can't believe there is just "error" with absolutely no hint for the source of error. What if you run it again but with -v -v -v arguments after pip?

Flock1 commented 5 years ago

@shrubb, I'll send the result soon

Flock1 commented 5 years ago

@shrubb https://gist.github.com/Flock1/8c82d180e1e9fd9c01c1cefbb9742d21

Flock1 commented 5 years ago

@shrubb, this is what I get when I try to import:

from box_convolution import BoxConv2d
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sarvagya/Desktop/RBC/box-convolutions/box_convolution/__init__.py", line 1, in <module>
    from .box_convolution_module import BoxConv2d
  File "/home/sarvagya/Desktop/RBC/box-convolutions/box_convolution/box_convolution_module.py", line 4, in <module>
    from .box_convolution_function import BoxConvolutionFunction, reparametrize
  File "/home/sarvagya/Desktop/RBC/box-convolutions/box_convolution/box_convolution_function.py", line 3, in <module>
    import box_convolution_cpp_cuda as cpp_cuda
ModuleNotFoundError: No module named 'box_convolution_cpp_cuda'
shrubb commented 5 years ago

@Flock1 Clearly, it didn't work because you actually don't have g++-7 installed. See for example this to check what other compilers you have in your system, and this if you use Anaconda.

Flock1 commented 5 years ago

@shrubb, so I should try with GCC 7?

Flock1 commented 5 years ago

@shrubb, here's the log with g++ 7 https://gist.github.com/Flock1/752cd000570deb48d75c8544f4aa4257

shrubb commented 5 years ago

@Flock1 Apparently, CUDA compiler didn't pick up g++-7. Just pushed a fix for that. Could you please try the latest version, i.e. python3 -m pip install git+https://github.com/shrubb/box-convolutions.git ?

Flock1 commented 5 years ago

@shrubb, will do. Give me some time.

Flock1 commented 5 years ago

@shrubb , https://gist.github.com/Flock1/c852da5839091eb3f621a54d17c1de72

shrubb commented 5 years ago

@Flock1 you forgot to prepend CC=g++-7

Flock1 commented 5 years ago

@shrubb, the package is downloaded. Thanks.

Now, when I tried importing it, I got the following error: ImportError: /home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/box_convolution_cpp_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration

The full error is:

>>> import box_convolution
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sarvagya/Desktop/RBC/box-convolutions/box_convolution/__init__.py", line 1, in <module>
    from .box_convolution_module import BoxConv2d
  File "/home/sarvagya/Desktop/RBC/box-convolutions/box_convolution/box_convolution_module.py", line 4, in <module>
    from .box_convolution_function import BoxConvolutionFunction, reparametrize
  File "/home/sarvagya/Desktop/RBC/box-convolutions/box_convolution/box_convolution_function.py", line 3, in <module>
    import box_convolution_cpp_cuda as cpp_cuda
ImportError: /home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/box_convolution_cpp_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration
shrubb commented 5 years ago

This was solved in #9, see solution there. This is likely because you're using Anaconda which messes paths up.

Flock1 commented 5 years ago

@shrubb, so is there some way I can use box-convolution through conda environment?

shrubb commented 5 years ago

Yes, please read #9, someone else also had your setup and they succeeded, I posted a solution there, should help.

Flock1 commented 5 years ago

@shrubb, so the problem I'm facing is that my machine has CUDA 9.2 but torch installs 9.0 with it. Apparently, the CUDA installed by torch isn't very compatible with box_convolution. Or probably that's what's happening. I have attached the log. Let me know what you think.

https://gist.github.com/Flock1/0d12d133157eb326146736dfa5f1c3e0

shrubb commented 5 years ago

@Flock1 PyTorch doesn't install CUDA for you -- you had two CUDAs before. One of them is probably from Anaconda.

What does this command output?

ldd `python3 -c "import torch, os; print(os.path.dirname(torch.__file__))"`/lib/libtorch.so | grep libcu
Flock1 commented 5 years ago
libcudart-f7fdd8d7.so.9.0 => /home/sarvagya/anaconda3/lib/python3.6/site-packages/torch/lib/libcudart-f7fdd8d7.so.9.0 (0x00007fcf1750b000)

Also, this is the output for nvcc --version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148
shrubb commented 5 years ago

Hmmm. So your PyTorch is still pre-built with not-fully-compatible CUDA 9.0. Then, your options may be:

Flock1 commented 5 years ago

I have downgraded to CUDA 9.0 which is the same as pytorch. I ran the command above to install and the following: https://gist.github.com/Flock1/d12454e46d8db875a5c8adc7c7495d1e

shrubb commented 5 years ago

Aaargh, these compatibilities are annoying... Sorry for them.

Then for now the downgrading option is out. I'll still try to modify the code to work around the GCC 5 bug.

Flock1 commented 5 years ago

Haha :P, I can understand bro. I'll try other options as well. Let me know when you're done with updates.

Flock1 commented 5 years ago

@shrubb, I installed CUDA 10 and then it was downloaded. So apparently, box convolution won't work well with CUDA 9

illestom commented 5 years ago

Hi, I try to install this on windows 10 with anaconda and CUDA 10. I installed Windows 10 SDK (10.0.17763.0) couse pytorch require c++ compiler, but the installation always ended with error of cl.exe with error code 2. Is there anybody with windows experience who can help me? I read threads about this but nothing usefull :S

shrubb commented 5 years ago

@illestom Could you post the full output of python3 -m pip install -v -v -v git+https://github.com/shrubb/box-convolutions.git?

illestom commented 5 years ago

@shrubb It's very long error message so i copypate the end of it.. i give the whole if needed `........ data src\box_convolution_interface.cpp(367): error C2146: syntax error: missing ';' before identifier 'or' src\box_convolution_interface.cpp(367): error C2065: 'or': undeclared identifier src\box_convolution_interface.cpp(367): error C2146: syntax error: missing ';' before identifier 'paramId' src\box_convolution_interface.cpp(367): warning C4553: '==': result of expression not used; did you intend '='? src\box_convolution_interface.cpp(368): error C2065: 'not': undeclared identifier src\box_convolution_interface.cpp(368): error C2146: syntax error: missing ';' before identifier 'needXDeriv' src\box_convolution_interface.cpp(378): error C2146: syntax error: missing ';' before identifier 'or' src\box_convolution_interface.cpp(378): error C2065: 'or': undeclared identifier src\box_convolution_interface.cpp(378): error C2146: syntax error: missing ';' before identifier 'paramId' src\box_convolution_interface.cpp(378): warning C4553: '==': result of expression not used; did you intend '='? src\box_convolution_interface.cpp(388): error C2146: syntax error: missing ';' before identifier 'or' src\box_convolution_interface.cpp(388): error C2065: 'or': undeclared identifier src\box_convolution_interface.cpp(388): error C2146: syntax error: missing ';' before identifier 'paramId' error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe' failed with exit status 2

----------------------------------------

Command "C:\Users\Tamas\AppData\Local\conda\conda\envs\new\python.exe -u -c "import setuptools, tokenize;file='C:\Users\Tamas\AppData\Local\Temp\pip-req-build-hs215zk\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\Tamas\AppData\Local\Temp\pip-record-bmx7wd6b\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\Tamas\AppData\Local\Temp\pip-req-build-hs215zk`

shrubb commented 5 years ago

@illestom Hm, yes, the important parts are missing there. I'd be happy if you provided the full output.

May I ask you to run

python3 -m pip install -v -v -v git+https://github.com/shrubb/box-convolutions.git --log log.txt

and then upload the log.txt file to either http://gist.github.com or http://pastebin.com?

illestom commented 5 years ago

@shrubb https://drive.google.com/file/d/1DhfiBlqoocepKcbeLJRfzqN9Cqe16Kir/view?usp=sharing the log.txt

shrubb commented 5 years ago

@illestom Just pushed a fix, did it help?

illestom commented 5 years ago

@shrubb the previous problem is gone, but now there is some CUDA problem, i guess :S https://drive.google.com/file/d/1c1N_dnK3huI9MnX0G2v7T5-U1gT7J9v8/view?usp=sharing