Closed WenFuLee closed 5 years ago
The most likely reason is that the pytorch version you used to build the operator is different from the pytorch version you used to run experiments. Please double check the python env/pytorch version and try to rebuild the operators (don't forget to delete upsnet/operators/build folder
first)
Thanks for the reply.
Below are the versions of my pythond and pytorch. python 3.6.8 pytorch 0.4.1
Also, I just followed your suggestions. (1) Delete upsnet/operators/build folder first (2) Run "init.sh" to rebuild the operators (3) Run the experiment.
But still got the same issue. Is there anything I missed or misunderstood? Also, when building the operators, I got warnings below. Does it matter? "cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++"
Thanks.
Try to run python build_deform_conv.py build_ext --inplace
and python build_roialign.py build_ext --inplace
manually, make sure your python is with pytorch 0.4.1, then run python under upsnet/operators/_ext/deform_conv
, make sure your python is with pytorch 0.4.1 again, then execute import torch
and import deform_conv_cuda
manually, it should be no problem if your environment setup is correct.
The warning can be just ignored.
Below is the result of following your suggestions. Would you mind telling me what might be the reasons for this environment issue? Thanks.
==== ~/UPSNet_ROOT/upsnet/operators/_ext/deform_conv$ python Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.
import torch print(torch.version) 0.4.1 import deform_conv_cuda Traceback (most recent call last): File "
", line 1, in ImportError: /home/wen-fulee/UPSNet_ROOT/upsnet/operators/_ext/deform_conv/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at5ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
can you show me the output of the operator building?
Do you mean this?
====
~/UPSNet_ROOT/upsnet/operators$ python build_deform_conv.py build_ext --inplace
running build_ext
building 'deform_conv_cuda' extension
creating build
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/src
gcc -pthread -B /opt/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/wen-fulee/UPSNet_ROOT/upsnet/operators/src -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/anaconda3/include/python3.6m -c src/deform_conv_cuda.cpp -o build/temp.linux-x86_64-3.6/src/deform_conv_cuda.o -DTORCH_EXTENSION_NAME=deform_conv_cuda -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/local/cuda/bin/nvcc -I/home/wen-fulee/UPSNet_ROOT/upsnet/operators/src -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/anaconda3/include/python3.6m -c src/deform_conv_kernel.cu -o build/temp.linux-x86_64-3.6/src/deform_conv_kernel.o -O2 -DTORCH_EXTENSION_NAME=deform_conv_cuda --compiler-options '-fPIC' -std=c++11
creating build/lib.linux-x86_64-3.6
g++ -pthread -shared -B /opt/anaconda3/compiler_compat -L/opt/anaconda3/lib -Wl,-rpath=/opt/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/src/deform_conv_cuda.o build/temp.linux-x86_64-3.6/src/deform_conv_kernel.o -L/usr/local/cuda/lib64 -lcudart -o build/lib.linux-x86_64-3.6/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.6/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so ->
====
~/UPSNet_ROOT/upsnet/operators$ python build_roialign.py build_ext --inplace
running build_ext
building 'roi_align_cuda' extension
gcc -pthread -B /opt/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/wen-fulee/UPSNet_ROOT/upsnet/operators/src -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/anaconda3/include/python3.6m -c src/roi_align_cuda.cpp -o build/temp.linux-x86_64-3.6/src/roi_align_cuda.o -DTORCH_EXTENSION_NAME=roi_align_cuda -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
src/roi_align_cuda.cpp: In function ‘int roi_align_forward_cuda(int, int, int, float, at::Tensor, at::Tensor, at::Tensor)’:
src/roi_align_cuda.cpp:58:7: warning: unused variable ‘batch_size’ [-Wunused-variable]
int batch_size = features.size(0);
^~~~~~~~~~
/usr/local/cuda/bin/nvcc -I/home/wen-fulee/UPSNet_ROOT/upsnet/operators/src -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/anaconda3/include/python3.6m -c src/roi_align_kernel.cu -o build/temp.linux-x86_64-3.6/src/roi_align_kernel.o -O2 -DTORCH_EXTENSION_NAME=roi_align_cuda --compiler-options '-fPIC' -std=c++11
g++ -pthread -shared -B /opt/anaconda3/compiler_compat -L/opt/anaconda3/lib -Wl,-rpath=/opt/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/src/roi_align_cuda.o build/temp.linux-x86_64-3.6/src/roi_align_kernel.o -L/usr/local/cuda/lib64 -lcudart -o build/lib.linux-x86_64-3.6/roi_align_cuda.cpython-36m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.6/roi_align_cuda.cpython-36m-x86_64-linux-gnu.so ->
I think you can find solution here: https://github.com/pytorch/extension-cpp/issues/6#issuecomment-424948254. I'm surprised that -D_GLIBCXX_USE_CXX11_ABI=0
doesn't show in your compile argument since it shows in my side. Please check if your gcc version is > 5.1, if it is I think that's the case. You can manually add '-D_GLIBCXX_USE_CXX11_ABI=0'
to https://github.com/uber-research/UPSNet/blob/master/upsnet/operators/build_deform_conv.py, L51 and L52. The same applies to build_roialign.py
Do I add -D_GLIBCXX_USE_CXX11_ABI=0
or _GLIBCXX_USE_CXX11_ABI=0
?
In the post you shared, they seem to use _GLIBCXX_USE_CXX11_ABI=0
instead.
_GLIBCXX_USE_CXX11_ABI
is the macro name, in compiler argument it should be -D_GLIBCXX_USE_CXX11_ABI=0
with a -D
prefix
Thanks. I might have a little progress, but still got an error below. I google it, which might be related to my cuda version: https://github.com/NVlabs/PWC-Net/issues/11 My current cuda version is: release 10.0, V10.0.130 Could this be the possible reason?
~/UPSNet_ROOT$ python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml
upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
exp_config = edict(yaml.load(f))
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 61, in <module>
from upsnet.models import *
File "upsnet/../upsnet/models/__init__.py", line 1, in <module>
from .resnet_upsnet import resnet_50_upsnet, resnet_101_upsnet
File "upsnet/../upsnet/models/resnet_upsnet.py", line 22, in <module>
from upsnet.models.resnet import get_params, resnet_rcnn, ResNetBackbone
File "upsnet/../upsnet/models/resnet.py", line 21, in <module>
from upsnet.operators.modules.deform_conv import DeformConv
File "upsnet/../upsnet/operators/modules/deform_conv.py", line 22, in <module>
from upsnet.operators.functions.deform_conv import DeformConvFunction
File "upsnet/../upsnet/operators/functions/deform_conv.py", line 21, in <module>
from .._ext.deform_conv import deform_conv_cuda
ImportError: upsnet/../upsnet/operators/_ext/deform_conv/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration
I never saw this issue before. Probably changing cuda version would solve it. From my side cuda 9.1/gcc 4.9.4 works for me
Hi, have you solved this problem ? I meet the same one, but I have no idea about that.
After downgrading CUDA to 9.1, this was solved.
Hi, I also have this same issue. I followed the instruction cited here but still not work(add flags after cxx && nvcc), my torch version and cuda version is all the same. Any further suggestions? Thanks.
Hi, I also have this same issue. I followed the instruction cited here but still not work(add flags after cxx && nvcc), my torch version and cuda version is all the same. Any further suggestions? Thanks.
Please check your CUDA path in profile.
After downgrading CUDA to 9.1, this was solved.
Hello,Your cuda is 9.1, then what version of gcc can run the network?
My version is GCC 7.3.0.
ImportError: upsnet/../upsnet/operators/_ext/deform_conv/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC1ENS_14SourceLocationERKSs
Also -D_GLIBCXX_USE_CXX11_ABI=0 is there while compilation.
Torch version 1.0.1 GCC 7.3.0 CUDA 9.0
What could be the problem ?
Below is the error message I got. Not so sure about how to fix it. Could you help me with this? Thanks.
==== UPSNet_ROOT$ python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. exp_config = edict(yaml.load(f)) Traceback (most recent call last): File "upsnet/upsnet_end2end_train.py", line 61, in
from upsnet.models import *
File "upsnet/../upsnet/models/init.py", line 1, in
from .resnet_upsnet import resnet_50_upsnet, resnet_101_upsnet
File "upsnet/../upsnet/models/resnet_upsnet.py", line 22, in
from upsnet.models.resnet import get_params, resnet_rcnn, ResNetBackbone
File "upsnet/../upsnet/models/resnet.py", line 21, in
from upsnet.operators.modules.deform_conv import DeformConv
File "upsnet/../upsnet/operators/modules/deform_conv.py", line 22, in
from upsnet.operators.functions.deform_conv import DeformConvFunction
File "upsnet/../upsnet/operators/functions/deform_conv.py", line 21, in
from .._ext.deform_conv import deform_conv_cuda
ImportError: upsnet/../upsnet/operators/_ext/deform_conv/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at19UndefinedTensorImpl10_singletonE