uber-research / UPSNet

UPSNet: A Unified Panoptic Segmentation Network
Other
649 stars 119 forks source link

Question about undefined symbol #6

Closed WenFuLee closed 5 years ago

WenFuLee commented 5 years ago

Below is the error message I got. Not so sure about how to fix it. Could you help me with this? Thanks.

==== UPSNet_ROOT$ python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. exp_config = edict(yaml.load(f)) Traceback (most recent call last): File "upsnet/upsnet_end2end_train.py", line 61, in from upsnet.models import * File "upsnet/../upsnet/models/init.py", line 1, in from .resnet_upsnet import resnet_50_upsnet, resnet_101_upsnet File "upsnet/../upsnet/models/resnet_upsnet.py", line 22, in from upsnet.models.resnet import get_params, resnet_rcnn, ResNetBackbone File "upsnet/../upsnet/models/resnet.py", line 21, in from upsnet.operators.modules.deform_conv import DeformConv File "upsnet/../upsnet/operators/modules/deform_conv.py", line 22, in from upsnet.operators.functions.deform_conv import DeformConvFunction File "upsnet/../upsnet/operators/functions/deform_conv.py", line 21, in from .._ext.deform_conv import deform_conv_cuda ImportError: upsnet/../upsnet/operators/_ext/deform_conv/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at19UndefinedTensorImpl10_singletonE

YuwenXiong commented 5 years ago

The most likely reason is that the pytorch version you used to build the operator is different from the pytorch version you used to run experiments. Please double check the python env/pytorch version and try to rebuild the operators (don't forget to delete upsnet/operators/build folder first)

WenFuLee commented 5 years ago

Thanks for the reply.

Below are the versions of my pythond and pytorch. python 3.6.8 pytorch 0.4.1

Also, I just followed your suggestions. (1) Delete upsnet/operators/build folder first (2) Run "init.sh" to rebuild the operators (3) Run the experiment.

But still got the same issue. Is there anything I missed or misunderstood? Also, when building the operators, I got warnings below. Does it matter? "cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++"

Thanks.

YuwenXiong commented 5 years ago

Try to run python build_deform_conv.py build_ext --inplace and python build_roialign.py build_ext --inplace manually, make sure your python is with pytorch 0.4.1, then run python under upsnet/operators/_ext/deform_conv, make sure your python is with pytorch 0.4.1 again, then execute import torch and import deform_conv_cuda manually, it should be no problem if your environment setup is correct.

The warning can be just ignored.

WenFuLee commented 5 years ago

Below is the result of following your suggestions. Would you mind telling me what might be the reasons for this environment issue? Thanks.

==== ~/UPSNet_ROOT/upsnet/operators/_ext/deform_conv$ python Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import torch print(torch.version) 0.4.1 import deform_conv_cuda Traceback (most recent call last): File "", line 1, in ImportError: /home/wen-fulee/UPSNet_ROOT/upsnet/operators/_ext/deform_conv/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at5ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

YuwenXiong commented 5 years ago

can you show me the output of the operator building?

WenFuLee commented 5 years ago

Do you mean this?

====

~/UPSNet_ROOT/upsnet/operators$ python build_deform_conv.py build_ext --inplace
running build_ext
building 'deform_conv_cuda' extension
creating build
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/src
gcc -pthread -B /opt/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/wen-fulee/UPSNet_ROOT/upsnet/operators/src -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/anaconda3/include/python3.6m -c src/deform_conv_cuda.cpp -o build/temp.linux-x86_64-3.6/src/deform_conv_cuda.o -DTORCH_EXTENSION_NAME=deform_conv_cuda -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/local/cuda/bin/nvcc -I/home/wen-fulee/UPSNet_ROOT/upsnet/operators/src -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/anaconda3/include/python3.6m -c src/deform_conv_kernel.cu -o build/temp.linux-x86_64-3.6/src/deform_conv_kernel.o -O2 -DTORCH_EXTENSION_NAME=deform_conv_cuda --compiler-options '-fPIC' -std=c++11
creating build/lib.linux-x86_64-3.6
g++ -pthread -shared -B /opt/anaconda3/compiler_compat -L/opt/anaconda3/lib -Wl,-rpath=/opt/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/src/deform_conv_cuda.o build/temp.linux-x86_64-3.6/src/deform_conv_kernel.o -L/usr/local/cuda/lib64 -lcudart -o build/lib.linux-x86_64-3.6/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.6/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so -> 

====

~/UPSNet_ROOT/upsnet/operators$ python build_roialign.py build_ext --inplace
running build_ext
building 'roi_align_cuda' extension
gcc -pthread -B /opt/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/wen-fulee/UPSNet_ROOT/upsnet/operators/src -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/anaconda3/include/python3.6m -c src/roi_align_cuda.cpp -o build/temp.linux-x86_64-3.6/src/roi_align_cuda.o -DTORCH_EXTENSION_NAME=roi_align_cuda -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
src/roi_align_cuda.cpp: In function ‘int roi_align_forward_cuda(int, int, int, float, at::Tensor, at::Tensor, at::Tensor)’:
src/roi_align_cuda.cpp:58:7: warning: unused variable ‘batch_size’ [-Wunused-variable]
   int batch_size = features.size(0);
       ^~~~~~~~~~
/usr/local/cuda/bin/nvcc -I/home/wen-fulee/UPSNet_ROOT/upsnet/operators/src -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/anaconda3/include/python3.6m -c src/roi_align_kernel.cu -o build/temp.linux-x86_64-3.6/src/roi_align_kernel.o -O2 -DTORCH_EXTENSION_NAME=roi_align_cuda --compiler-options '-fPIC' -std=c++11
g++ -pthread -shared -B /opt/anaconda3/compiler_compat -L/opt/anaconda3/lib -Wl,-rpath=/opt/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/src/roi_align_cuda.o build/temp.linux-x86_64-3.6/src/roi_align_kernel.o -L/usr/local/cuda/lib64 -lcudart -o build/lib.linux-x86_64-3.6/roi_align_cuda.cpython-36m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.6/roi_align_cuda.cpython-36m-x86_64-linux-gnu.so ->
YuwenXiong commented 5 years ago

I think you can find solution here: https://github.com/pytorch/extension-cpp/issues/6#issuecomment-424948254. I'm surprised that -D_GLIBCXX_USE_CXX11_ABI=0 doesn't show in your compile argument since it shows in my side. Please check if your gcc version is > 5.1, if it is I think that's the case. You can manually add '-D_GLIBCXX_USE_CXX11_ABI=0' to https://github.com/uber-research/UPSNet/blob/master/upsnet/operators/build_deform_conv.py, L51 and L52. The same applies to build_roialign.py

WenFuLee commented 5 years ago

Do I add -D_GLIBCXX_USE_CXX11_ABI=0 or _GLIBCXX_USE_CXX11_ABI=0? In the post you shared, they seem to use _GLIBCXX_USE_CXX11_ABI=0 instead.

YuwenXiong commented 5 years ago

_GLIBCXX_USE_CXX11_ABI is the macro name, in compiler argument it should be -D_GLIBCXX_USE_CXX11_ABI=0 with a -D prefix

WenFuLee commented 5 years ago

Thanks. I might have a little progress, but still got an error below. I google it, which might be related to my cuda version: https://github.com/NVlabs/PWC-Net/issues/11 My current cuda version is: release 10.0, V10.0.130 Could this be the possible reason?

~/UPSNet_ROOT$ python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml
upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  exp_config = edict(yaml.load(f))
Traceback (most recent call last):
  File "upsnet/upsnet_end2end_train.py", line 61, in <module>
    from upsnet.models import *
  File "upsnet/../upsnet/models/__init__.py", line 1, in <module>
    from .resnet_upsnet import resnet_50_upsnet, resnet_101_upsnet
  File "upsnet/../upsnet/models/resnet_upsnet.py", line 22, in <module>
    from upsnet.models.resnet import get_params, resnet_rcnn, ResNetBackbone
  File "upsnet/../upsnet/models/resnet.py", line 21, in <module>
    from upsnet.operators.modules.deform_conv import DeformConv
  File "upsnet/../upsnet/operators/modules/deform_conv.py", line 22, in <module>
    from upsnet.operators.functions.deform_conv import DeformConvFunction
  File "upsnet/../upsnet/operators/functions/deform_conv.py", line 21, in <module>
    from .._ext.deform_conv import deform_conv_cuda
ImportError: upsnet/../upsnet/operators/_ext/deform_conv/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration
YuwenXiong commented 5 years ago

I never saw this issue before. Probably changing cuda version would solve it. From my side cuda 9.1/gcc 4.9.4 works for me

dongzhang89 commented 5 years ago

Hi, have you solved this problem ? I meet the same one, but I have no idea about that.

WenFuLee commented 5 years ago

After downgrading CUDA to 9.1, this was solved.

whw19950510 commented 5 years ago

Hi, I also have this same issue. I followed the instruction cited here but still not work(add flags after cxx && nvcc), my torch version and cuda version is all the same. Any further suggestions? Thanks.

dongzhang89 commented 5 years ago

Hi, I also have this same issue. I followed the instruction cited here but still not work(add flags after cxx && nvcc), my torch version and cuda version is all the same. Any further suggestions? Thanks.

Please check your CUDA path in profile.

lfdeep commented 5 years ago

After downgrading CUDA to 9.1, this was solved.

Hello,Your cuda is 9.1, then what version of gcc can run the network?

WenFuLee commented 5 years ago

My version is GCC 7.3.0.

gaussiangit commented 5 years ago

ImportError: upsnet/../upsnet/operators/_ext/deform_conv/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC1ENS_14SourceLocationERKSs

Also -D_GLIBCXX_USE_CXX11_ABI=0 is there while compilation.

Torch version 1.0.1 GCC 7.3.0 CUDA 9.0

What could be the problem ?