ImportError: libtorch_cuda_cu.so: cannot open shared object file

v-nhandnt21 commented 3 years ago

Hi everyone, I have successfully installed MMDetection3D, but could not run the demo code due to the error ImportError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory. The details can be found below. I do appreciate if anyone could help me out. Thanks.

[UPDATE] You can find the solution at the end of this post.

Describe the bug When I ran demo/pcd_demo.py or mmdet3d/utils/collect_env.py, I got the error ImportError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory.

Reproduction

How I installed MMDetection3D: Pull the PyTorch docker image with tag 20.12-py3 from NVIDIA NGC: docker pull nvcr.io/nvidia/pytorch:20.12-py3. Launch a docker container.

apt update
apt install ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6
apt clean
rm -rf /var/lib/apt/lists/*
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.8.0/index.html
pip install mmdet
conda clean --all
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
pip install -r requirements/build.txt
pip install --no-cache-dir -v -e .
pip uninstall pycocotools --no-cache-dir
pip install mmpycocotools --no-cache-dir --force --no-deps

Create 2 directories named checkpoints and results inside mmdetection3d. Download a pretrained model and save it in checkpoints.

How I got the error: run either

python demo/pcd_demo.py demo/data/kitti/kitti_000008.bin configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py checkpoints/hv_second_secfpn_6x8_80e_kitti-3d-car_20200620_230238-393f000c.pth --out-dir results

or

python mmdet3d/utils/collect_env.py

Environment

I could not run python mmdet3d/utils/collect_env.py to collect necessary environment infomation as suggested.
OS: Ubuntu 20.04.2 LTS
GPU: RTX 3070. Driver: 460.32.03.
cuDNN 8.0.5; CUDA 11.1.1 including cuBLAS 11.3.0.
Python 3.8.5.
PyTorch 1.8.0a0+1606899.

Error traceback When running demo/pcd_demo.py:

Traceback (most recent call last):
  File "demo/pcd_demo.py", line 3, in <module>
    from mmdet3d.apis import inference_detector, init_detector, show_result_meshlab
  File "/mydev/code/mmdetection3d/mmdet3d/apis/__init__.py", line 1, in <module>
    from .inference import (convert_SyncBN, inference_detector,
  File "/mydev/code/mmdetection3d/mmdet3d/apis/inference.py", line 10, in <module>
    from mmdet3d.core import (Box3DMode, DepthInstance3DBoxes,
  File "/mydev/code/mmdetection3d/mmdet3d/core/__init__.py", line 1, in <module>
    from .anchor import *  # noqa: F401, F403
  File "/mydev/code/mmdetection3d/mmdet3d/core/anchor/__init__.py", line 1, in <module>
    from mmdet.core.anchor import build_anchor_generator
  File "/opt/conda/lib/python3.8/site-packages/mmdet/core/__init__.py", line 2, in <module>
    from .bbox import *  # noqa: F401, F403
  File "/opt/conda/lib/python3.8/site-packages/mmdet/core/bbox/__init__.py", line 7, in <module>
    from .samplers import (BaseSampler, CombinedSampler,
  File "/opt/conda/lib/python3.8/site-packages/mmdet/core/bbox/samplers/__init__.py", line 9, in <module>
    from .score_hlr_sampler import ScoreHLRSampler
  File "/opt/conda/lib/python3.8/site-packages/mmdet/core/bbox/samplers/score_hlr_sampler.py", line 2, in <module>
    from mmcv.ops import nms_match
  File "/opt/conda/lib/python3.8/site-packages/mmcv/ops/__init__.py", line 1, in <module>
    from .bbox import bbox_overlaps
  File "/opt/conda/lib/python3.8/site-packages/mmcv/ops/bbox.py", line 3, in <module>
    ext_module = ext_loader.load_ext('_ext', ['bbox_overlaps'])
  File "/opt/conda/lib/python3.8/site-packages/mmcv/utils/ext_loader.py", line 11, in load_ext
    ext = importlib.import_module('mmcv.' + name)
  File "/opt/conda/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory

When running mmdet3d/utils/collect_env.py:

Traceback (most recent call last):
  File "mmdet3d/utils/collect_env.py", line 18, in <module>
    for name, val in collect_env().items():
  File "mmdet3d/utils/collect_env.py", line 10, in collect_env
    env_info = collect_base_env()
  File "/opt/conda/lib/python3.8/site-packages/mmcv/utils/env.py", line 85, in collect_env
    from mmcv.ops import get_compiler_version, get_compiling_cuda_version
  File "/opt/conda/lib/python3.8/site-packages/mmcv/ops/__init__.py", line 1, in <module>
    from .bbox import bbox_overlaps
  File "/opt/conda/lib/python3.8/site-packages/mmcv/ops/bbox.py", line 3, in <module>
    ext_module = ext_loader.load_ext('_ext', ['bbox_overlaps'])
  File "/opt/conda/lib/python3.8/site-packages/mmcv/utils/ext_loader.py", line 11, in load_ext
    ext = importlib.import_module('mmcv.' + name)
  File "/opt/conda/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory

[UPDATE] Solution Following Tai-Wang's suggestion, I re-built all 3 libraries mmcv, mmdet and mmdet3d from source and the aforementioned error was fixed. In summary, my step-by-step installation is as follows: Pull the PyTorch docker image with tag 20.12-py3 from NVIDIA NGC: docker pull nvcr.io/nvidia/pytorch:20.12-py3. Launch a docker container.

apt update
apt install ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6
apt clean
rm -rf /var/lib/apt/lists/*
git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
MMCV_WITH_OPS=1 pip install -e .
cd ..
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
pip install -r requirements/build.txt
pip install -v -e .
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
pip install -v -e .

Wuziyi616 commented 3 years ago

I am trying to reproduce your error so please be patient. One thing I want to make sure, can you successfully import torch, mmcv and mmdet? Can you try running some simple demos of mmdet to make sure whether the question really stems from mmdet3d? Thanks!

v-nhandnt21 commented 3 years ago

Hi @Wuziyi616, thanks for your reply.

Yes, I could successfully run import torch, mmcv, mmdet.

However, when I followed the MMDetection demo here, I got similar error when running from mmdet.apis import init_detector, inference_detector. Error traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mydev/code/mmdetection/mmdet/apis/__init__.py", line 1, in <module>
    from .inference import (async_inference_detector, inference_detector,
  File "/mydev/code/mmdetection/mmdet/apis/inference.py", line 6, in <module>
    from mmcv.ops import RoIPool
  File "/opt/conda/lib/python3.8/site-packages/mmcv/ops/__init__.py", line 1, in <module>
    from .bbox import bbox_overlaps
  File "/opt/conda/lib/python3.8/site-packages/mmcv/ops/bbox.py", line 3, in <module>
    ext_module = ext_loader.load_ext('_ext', ['bbox_overlaps'])
  File "/opt/conda/lib/python3.8/site-packages/mmcv/utils/ext_loader.py", line 11, in load_ext
    ext = importlib.import_module('mmcv.' + name)
  File "/opt/conda/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory

Tai-Wang commented 3 years ago

It seems like there are some problems with your cuda environment. Please check whether the torch.cuda.is_available is True. I also wonder whether you have successfully run other projects that need GPUs in this environment. If not, please double check the compatibility of cuda and pytorch version. I never tried docker image of pytorch, but from the version name, I guess it may need some specific compilation environment?

v-nhandnt21 commented 3 years ago

Hi @Tai-Wang, I pulled the pre-built docker for PyTorch from NVIDIA NGC so I suppose the CUDA and PyTorch versions are compatible.

I run the following code to check my GPU in this environment. It looks normal:

>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_name(0)
'GeForce RTX 3070'
>>> x = torch.rand(4, 4).cuda()
>>> y = torch.rand(4, 4).cuda()
>>> print(x)
tensor([[0.0237, 0.5399, 0.4616, 0.0852],
        [0.0519, 0.2945, 0.3474, 0.5457],
        [0.3978, 0.2158, 0.1694, 0.3284],
        [0.1239, 0.5423, 0.5858, 0.0306]], device='cuda:0')
>>> print(y)
tensor([[0.7057, 0.9050, 0.7614, 0.4206],
        [0.1033, 0.1385, 0.5892, 0.8567],
        [0.0723, 0.3362, 0.3061, 0.5092],
        [0.9561, 0.7784, 0.8756, 0.8788]], device='cuda:0')
>>> z = x * y
>>> print(z)
tensor([[0.0167, 0.4886, 0.3514, 0.0358],
        [0.0054, 0.0408, 0.2046, 0.4675],
        [0.0288, 0.0725, 0.0518, 0.1672],
        [0.1185, 0.4221, 0.5129, 0.0269]], device='cuda:0')

Apart from this test, I successfully ran the demo of OpenPCDet, which did load the data on GPU.

Tai-Wang commented 3 years ago

Quite interesting. You can successfully import mmcv while fail to do it in the mmdet3d (from the error message). It seems like there are still some problems with your installed mmcv. Maybe you can try to uninstall mmcv and mmdet and reinstall it by building from source, like:

pip uninstall mmcv-full
pip uninstall mmdet
git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
MMCV_WITH_OPS=1 pip install -e .  # package mmcv-full will be installed after this step
cd ..
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
pip install -r requirements/build.txt
pip install -v -e .

Then build mmdet3d again. In this way, we can guarantee the compatibility of compilation environment with mmcv/mmdet/mmdet3d. BTW, you can also refer to the trouble shooting page of mmcv, which maybe useful for you.

v-nhandnt21 commented 3 years ago

Hi @Tai-Wang, as you suggested, I built all 3 libraries mmcv, mmdet and mmdet3d from source and it works. I can successfully run demo/pcd_demo.py and mmdet3d/utils/collect_env.py now.

I updated the original post to include this solution.

I close this issue for now. Thanks for your support, @Wuziyi616 and @Tai-Wang.

roachsinai commented 3 years ago

Hi @Tai-Wang I run

pip install mmcv-full==1.3.9 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html

to install mmcv, then run

from mmdet.apis import inference_detector, init_detector

to check mmdetection, and I got same error:

ImportError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory

The code poseted by @v-nhandnt21 to check GPU run succeed on my PC, too.

So I try to install mmcv from source, but I got same error posted in https://github.com/open-mmlab/mmcv/issues/1363

So what can I do now to use mmdet and mmpose, thanks in advanced!

Wuziyi616 commented 3 years ago

You should open an issue under mmpose about this question, because I don't see anything related to mmdet3d in your description here. It's unrelated to our repo

roachsinai commented 3 years ago

OK, thanks for your reply!

tyaiga commented 3 years ago

I had the same error when using mmdet and mmdet3d In my case, I solved the problem by installing mmcv from source.

tsrobcvai commented 1 year ago

I have only reinstalled the mmcv, which solved the similar problem.

open-mmlab / mmdetection3d

ImportError: libtorch_cuda_cu.so: cannot open shared object file #438