open-mmlab / mmcv

OpenMMLab Computer Vision Foundation
https://mmcv.readthedocs.io/en/latest/
Apache License 2.0
5.91k stars 1.65k forks source link

Import Error when installing mmcv using pip and mim #3082

Open laurahgdrn opened 7 months ago

laurahgdrn commented 7 months ago

Prerequisite

Environment

I am trying to install mmaction2 in my Google Colab environment by following the official google Colab tutorial (https://colab.research.google.com/github/open-mmlab/mmaction2/blob/master/demo/mmaction2_tutorial.ipynb#scrollTo=No_zZAFpWC-a)

This is my Google Colab environment:

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Aug_15_22:02:13_PDT_2023 Cuda compilation tools, release 12.2, V12.2.140 Build cuda_12.2.r12.2/compiler.33191640_0 gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

In the tutorial, PyTorch and torch vision use CUDA 11.8:

%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

and then installing MMEngine:

%pip install -U openmim
!mim install mmengine

and mmcv is installed like this:

!mim install "mmcv==2.0.0" However, then I get compatibility issues with the CUDA version of the Google Colab (12.1). Then, I tried to install the latest PyTorch version for CUDA 12.1 and the newest mmcv version (as described on the website):

!pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html mmaction2 is installed like this:

# Install mmaction2
!rm -rf mmaction2
!git clone https://github.com/open-mmlab/mmaction2.git -b main
%cd mmaction2

!pip install -e .

# Install some optional requirements
!pip install -r requirements/optional.txt

Reproduces the problem - code sample

https://colab.research.google.com/github/open-mmlab/mmaction2/blob/master/demo/mmaction2_tutorial.ipynb#scrollTo=No_zZAFpWC-a

import torch, torchvision
print(torch.__version__, torch.cuda.is_available())

import mmaction
print(mmaction.__version__)

from mmcv.ops import get_compiling_cuda_version, get_compiler_version
print(get_compiling_cuda_version())
print(get_compiler_version())

from mmengine.utils.dl_utils import collect_env
print(collect_env())

Reproduces the problem - command or script

# Check Pytorch installation
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())

# Check MMAction2 installation
import mmaction
print(mmaction.__version__)

# Check MMCV installation
from mmcv.ops import get_compiling_cuda_version, get_compiler_version
print(get_compiling_cuda_version())
print(get_compiler_version())

# Check MMEngine installation
from mmengine.utils.dl_utils import collect_env
print(collect_env())

Reproduces the problem - error message

2.2.1+cu121 True
1.2.0
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
[<ipython-input-2-55109e865b0d>](https://bhzb0mtqv64-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240411-060128_RC00_623759654#) in <cell line: 10>()
      8 
      9 # Check MMCV installation
---> 10 from mmcv.ops import get_compiling_cuda_version, get_compiler_version
     11 print(get_compiling_cuda_version())
     12 print(get_compiler_version())

3 frames
[/usr/lib/python3.10/importlib/__init__.py](https://bhzb0mtqv64-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240411-060128_RC00_623759654#) in import_module(name, package)
    124                 break
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 
    128 

ImportError: /usr/local/lib/python3.10/dist-packages/mmcv/_ext.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops10zeros_like4callERKNS_6TensorEN3c108optionalINS5_10ScalarTypeEEENS6_INS5_6LayoutEEENS6_INS5_6DeviceEEENS6_IbEENS6_INS5_12MemoryFormatEEE

Additional information

I have tried different versions of mmcv and mmcv-full, different PyTorch versions, but nothing seems to work, e.g.:

from the official mmcv documentation (https://mmcv.readthedocs.io/en/v1.7.0/get_started/installation.html):

!pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html

Nightly CUDA builds (from https://github.com/pytorch/pytorch/issues/91122):

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

zhouzaida commented 7 months ago

The workaround is to compile and install mmcv from source, or as a temporary solution, try installing mmcv-lite, and the colab example code should work fine.

The reason for the error is as follows: the error message here is _ext.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops10zero, which is usually caused by a discrepancy between the version of CUDA that's currently running and the version of MMCV that's being compiled. As you can see from the message above, colab's CUDA version is 12.2, and the mmcv you installed is !pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html, cu121 does not match colab's cuda version.