meder411 / PyTorch-EMDLoss

PyTorch 1.0 implementation of the approximate Earth Mover's Distance
136 stars 13 forks source link

Compilation issues #1

Closed flxai closed 5 years ago

flxai commented 5 years ago

I tried to built this on Ubuntu 18.04. Unfortunately I'm hit with the following error when invoking python setup.py install:

building '_emd' extension                                                                                                                                                                                          
gcc -pthread -B /home/user/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ipkg/include -I/home/user/.local/lib/python3.6/site-packages/torch/lib/include -I/home/user/.local/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -I/home/user/.local/lib/python3.6/site-packages/torch/lib/include/TH -I/home/user/.local/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/home/user/anaconda3/include/python3.6m -c pkg/src/emd.cpp -o build/temp.linux-x86_64-3.6/pkg/src/emd.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_emd -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++                                                                                                                    
/usr/local/cuda/bin/nvcc -Ipkg/include -I/home/user/.local/lib/python3.6/site-packages/torch/lib/include -I/home/user/.local/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -I/home/user/.local/lib/python3.6/site-packages/torch/lib/include/TH -I/home/user/.local/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/home/user/anaconda3/include/python3.6m -c pkg/src/cuda/emd.cu -o build/temp.linux-x86_64-3.6/pkg/src/cuda/emd.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_emd -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
In file included from /usr/local/cuda/include/host_config.h:50:0,                                                                                                                                                  
                 from /usr/local/cuda/include/cuda_runtime.h:78,                                                                                                                                                   
                 from <command-line>:0:                                                                                                                                                                            
/usr/local/cuda/include/crt/host_config.h:119:2: error: #error -- unsupported GNU version! gcc versions later than 6 are not supported!                                                                            
 #error -- unsupported GNU version! gcc versions later than 6 are not supported!                                                                                                                                   
  ^~~~~                                                                                                                                                                                                            
error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1                                                                                                                                                

The directory /usr/local/cuda links to /usr/local/cuda-9.0. I installed CUDA v9.0 through the package cuda-toolkit-9-0 from the following repository:

deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 /

I installed PyTorch v1.0.0 through Anaconda like so:

conda install pytorch torchvision -c pytorch

What am I doing wrong? Sorry if this is the wrong place to ask. I'd happily accept a nudge to some other kind of documentation.

meder411 commented 5 years ago

That looks like you have the wrong C and C++ compiler version tied to your CUDA installation. A quick Google search suggests this might be a potential solution.

meder411 commented 5 years ago

Also, do a fresh git pull before you use again. I fixed a wrong import path in the test script.

flxai commented 5 years ago

Thank you, it seems to work now. I adopted the linking like so:

sudo ln -s /usr/bin/gcc-5 /usr/local/cuda-9.0/bin/gcc
sudo ln -s /usr/bin/g++-5 /usr/local/cuda-9.0/bin/g++

The build then successfully finished with what I think to be negligible warnings:

pkg/include/cuda/emd.cuh(39): warning: specified alignment (4) is different from alignment (8) specified on a previous declaration
          detected during instantiation of "void approx_match_kernel(int64_t, int64_t, int64_t, int64_t, const T *, const T *, T *, T *) [with T=float]"
(237): here

pkg/include/cuda/emd.cuh(262): warning: specified alignment (4) is different from alignment (8) specified on a previous declaration
          detected during instantiation of "void match_cost_kernel(int64_t, int64_t, int64_t, int64_t, const T *, const T *, const T *, T *) [with T=float]"
(318): here

pkg/include/cuda/emd.cuh(342): warning: specified alignment (4) is different from alignment (8) specified on a previous declaration
          detected during instantiation of "void match_cost_grad2_kernel(int64_t, int64_t, int64_t, int64_t, const T *, const T *, const T *, T *) [with T=float]"
(444): here

Running test_emd_loss.py gives the following output:

Time:  0.0057408809661865234
tensor([6.1624], device='cuda:0', dtype=torch.float64,
       grad_fn=<EMDFunctionBackward>)
tensor(6.1624, device='cuda:0', dtype=torch.float64, grad_fn=<SumBackward0>)
tensor([[[ 0.2854,  1.2197,  0.8785],
         [ 0.8848,  1.1097, -0.0566],
         [ 1.9640,  0.0908, -0.3259],
         [ 0.9830, -0.1142,  1.2023],
         [ 1.0209,  0.7352,  1.4240]]], device='cuda:0', dtype=torch.float64)
tensor([[[-5.0648e-02, -1.2653e-01,  4.4372e-02],
         [ 3.3235e+00,  6.3753e+00, -8.6020e+00],
         [ 7.8830e-01,  7.8198e-01,  4.4210e-01],
         [ 4.3565e-02,  6.3839e-02,  7.6730e-02],
         [ 4.7981e+00,  2.4774e+00,  5.1132e+00],
         [ 3.2840e-03,  1.0217e-01,  5.4416e-03],
         [ 4.7761e-05,  9.6071e-04,  5.0468e-05],
         [-1.4062e-05, -2.7342e-05, -1.1873e-05],
         [ 1.2999e-03,  5.5120e-03, -1.7161e-05],
         [ 5.4406e-02,  3.7113e-02, -6.4375e-03]]], device='cuda:0',
       dtype=torch.float64)

So it all seems OK. Thanks again! :)