yhenon / pytorch-retinanet

Pytorch implementation of RetinaNet object detection.
Apache License 2.0
2.14k stars 665 forks source link

ImportError __cudaPopCallConfiguration #74

Open mvcaro opened 5 years ago

mvcaro commented 5 years ago

When running visualize.py I get the ImportError as bellow. I can't seem to be able to get past this error.

Any ideas on how to solve it?

(trial_retinanet_4) admin-ata168@admin-ata168-Workstation:~/trial_retinanet_4/pytorch-retinanet$ python visualize.py --dataset coco --coco_path '/home/admin-ata168/trial_retinanet_4/pytorch-retinanet/data/coco' --model '/home/admin-ata168/trial_retinanet_4/pytorch-retinanet/weights/coco_resnet_50_map_0_335.pt' 
CUDA available: True
loading annotations into memory...
Done (t=1.00s)
creating index...
index created!
Traceback (most recent call last):
  File "visualize.py", line 98, in <module>
    main()
  File "visualize.py", line 47, in main
    retinanet = torch.load(parser.model)
  File "/home/admin-ata168/anaconda3/envs/trial_retinanet_4/lib/python3.6/site-packages/torch/serialization.py", line 358, in load
    return _load(f, map_location, pickle_module)
  File "/home/admin-ata168/anaconda3/envs/trial_retinanet_4/lib/python3.6/site-packages/torch/serialization.py", line 542, in _load
    result = unpickler.load()
  File "/home/admin-ata168/trial_retinanet_4/pytorch-retinanet/model.py", line 9, in <module>
    from lib.nms.pth_nms import pth_nms
  File "/home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms/pth_nms.py", line 2, in <module>
    from ._ext import nms
  File "/home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms/_ext/nms/__init__.py", line 3, in <module>
    from ._nms import lib as _lib, ffi as _ffi
ImportError: /home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms/_ext/nms/_nms.so: undefined symbol: __cudaPopCallConfiguration

CUDA installed - CUDA Version 9.2.148 Linux 16.04 Conda environment:


# Name                    Version                   Build  Channel
backcall                  0.1.0                    py36_0  
blas                      1.0                         mkl  
ca-certificates           2019.1.23                     0    anaconda
certifi                   2019.3.9                 py36_0    anaconda
cffi                      1.12.3           py36h2e261b9_0  
chardet                   3.0.4                    pypi_0    pypi
cuda92                    1.0                           0    pytorch
cudatoolkit               9.0                  h13b8566_0  
cudnn                     7.3.1                 cuda9.0_0  
cycler                    0.10.0                   pypi_0    pypi
cython                    0.29.7                   pypi_0    pypi
decorator                 4.4.0                    py36_1  
freetype                  2.9.1                h8a8886c_1  
idna                      2.8                      pypi_0    pypi
imageio                   2.5.0                    pypi_0    pypi
intel-openmp              2019.3                      199  
ipython                   7.4.0            py36h39e3cac_0  
ipython_genutils          0.2.0                    py36_0  
jedi                      0.13.3                   py36_0  
jpeg                      9b                   h024ee3a_2  
kiwisolver                1.0.1                    pypi_0    pypi
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 8.2.0                hdf63c60_1  
libgfortran-ng            7.3.0                hdf63c60_0  
libpng                    1.6.36               hbc83047_0  
libstdcxx-ng              8.2.0                hdf63c60_1  
libtiff                   4.0.10               h2733197_2  
matplotlib                3.0.3                    pypi_0    pypi
mkl                       2018.0.3                      1  
mkl_fft                   1.0.6            py36h7dd41cf_0  
mkl_random                1.0.1            py36h4414c95_1  
nccl                      1.3.5                 cuda9.0_0  
ncurses                   6.1                  he6710b0_1  
networkx                  2.3                      pypi_0    pypi
ninja                     1.9.0            py36hfd86e86_0  
numpy                     1.15.4           py36h1d66e8a_0  
numpy-base                1.15.4           py36h81de0dd_0  
olefile                   0.46                     py36_0  
opencv-python             4.1.0.25                 pypi_0    pypi
openssl                   1.1.1                h7b6447c_0    anaconda
pandas                    0.24.2                   pypi_0    pypi
parso                     0.4.0                      py_0  
pexpect                   4.7.0                    py36_0  
pickleshare               0.7.5                    py36_0  
pillow                    6.0.0            py36h34e0f95_0  
pip                       19.0.3                   py36_0  
prompt_toolkit            2.0.9                    py36_0  
ptyprocess                0.6.0                    py36_0  
pycocotools               2.0.0                    pypi_0    pypi
pycparser                 2.19                     py36_0  
pygments                  2.3.1                    py36_0  
pyparsing                 2.4.0                    pypi_0    pypi
python                    3.6.8                h0371630_0  
python-dateutil           2.8.0                    pypi_0    pypi
pytorch                   0.4.1            py36ha74772b_0  
pytz                      2019.1                   pypi_0    pypi
pywavelets                1.0.3                    pypi_0    pypi
readline                  7.0                  h7b6447c_5  
requests                  2.21.0                   pypi_0    pypi
scikit-image              0.15.0                   pypi_0    pypi
scipy                     1.2.1                    pypi_0    pypi
setuptools                41.0.0                   py36_0  
six                       1.12.0                   py36_0  
sqlite                    3.28.0               h7b6447c_0  
tk                        8.6.8                hbc83047_0    anaconda
torchvision               0.2.1                    py36_0  
traitlets                 4.3.2                    py36_0  
urllib3                   1.24.2                   pypi_0    pypi
wcwidth                   0.1.7                    py36_0  
wheel                     0.33.1                   py36_0  
xz                        5.2.4                h14c3975_4  
zlib                      1.2.11               h7b6447c_3  
zstd                      1.3.7                h0b5b093_0  

And when running bash build.sh seems to work OK


Compiling nms kernels by nvcc...
Including CUDA code.
/home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms
generating /tmp/tmps81ofzxa/_nms.c
setting the current directory to '/tmp/tmps81ofzxa'
running build_ext
building '_nms' extension
creating home
creating home/admin-ata168
creating home/admin-ata168/trial_retinanet_4
creating home/admin-ata168/trial_retinanet_4/pytorch-retinanet
creating home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib
creating home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms
creating home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms/src
gcc -pthread -B /home/admin-ata168/anaconda3/envs/trial_retinanet_4/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/admin-ata168/anaconda3/envs/trial_retinanet_4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/admin-ata168/anaconda3/envs/trial_retinanet_4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/admin-ata168/anaconda3/envs/trial_retinanet_4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/admin-ata168/anaconda3/envs/trial_retinanet_4/include/python3.6m -c _nms.c -o ./_nms.o -std=c99 -std=c99
gcc -pthread -B /home/admin-ata168/anaconda3/envs/trial_retinanet_4/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/admin-ata168/anaconda3/envs/trial_retinanet_4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/admin-ata168/anaconda3/envs/trial_retinanet_4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/admin-ata168/anaconda3/envs/trial_retinanet_4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/admin-ata168/anaconda3/envs/trial_retinanet_4/include/python3.6m -c /home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms/src/nms.c -o ./home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms/src/nms.o -std=c99 -std=c99
gcc -pthread -B /home/admin-ata168/anaconda3/envs/trial_retinanet_4/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/admin-ata168/anaconda3/envs/trial_retinanet_4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/admin-ata168/anaconda3/envs/trial_retinanet_4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/admin-ata168/anaconda3/envs/trial_retinanet_4/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/admin-ata168/anaconda3/envs/trial_retinanet_4/include/python3.6m -c /home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms/src/nms_cuda.c -o ./home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms/src/nms_cuda.o -std=c99 -std=c99
/home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms/src/nms_cuda.c: In function ‘gpu_nms’:
/home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms/src/nms_cuda.c:29:35: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
   unsigned long long* mask_flat = THCudaLongTensor_data(state, mask);
                                   ^
/home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms/src/nms_cuda.c:37:40: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
   unsigned long long * mask_cpu_flat = THLongTensor_data(mask_cpu);
                                        ^
/home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms/src/nms_cuda.c:40:39: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
   unsigned long long* remv_cpu_flat = THLongTensor_data(remv_cpu);
                                       ^
/home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms/src/nms_cuda.c:23:7: warning: unused variable ‘boxes_dim’ [-Wunused-variable]
   int boxes_dim = THCudaTensor_size(state, boxes, 1);
       ^
gcc -pthread -shared -B /home/admin-ata168/anaconda3/envs/trial_retinanet_4/compiler_compat -L/home/admin-ata168/anaconda3/envs/trial_retinanet_4/lib -Wl,-rpath=/home/admin-ata168/anaconda3/envs/trial_retinanet_4/lib -Wl,--no-as-needed -Wl,--sysroot=/ ./_nms.o ./home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms/src/nms.o ./home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms/src/nms_cuda.o /home/admin-ata168/trial_retinanet_4/pytorch-retinanet/lib/nms/src/cuda/nms_kernel.cu.o -o ./_nms.so
abdur4373 commented 5 years ago

Hello @mvcaro I was also stuck in same issue. Initially i was having Cuda 10.0 and torch 1.0.1.post2. I reverted to cuda 9.0 and torch 0.4.1 which solved the issue. Just a reminder as i wasted a lot of time on it. After reverting to these settings we need to build nms extension again. (Step-4)

mvcaro commented 5 years ago

Thanks for the pointer @abdur4373

hxy1051653358 commented 5 years ago

@mvcaro Can you solve this problem? I have not solved it according to the above method.