open-mmlab / mmcv

OpenMMLab Computer Vision Foundation
https://mmcv.readthedocs.io/en/latest/
Apache License 2.0
5.74k stars 1.61k forks source link

**KAGGLE** --- mmagic error - undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationESs #2904

Open MasterHM-ml opened 11 months ago

MasterHM-ml commented 11 months ago
          cd mmcv && git checkout 2.x

Originally posted by @uniyushu in https://github.com/open-mmlab/mmcv/issues/2660#issuecomment-1467669827

cd mmcv && git checkout 2.x

I'm using mmcv=2.0.1, and still facing the same issue. I installed mmcv via mim. Here is how I installed it on Kaggle

!pip3 install -U openmim
!mim install 'mmcv>=2.0.0'
!mim install 'mmengine'

%cd /kaggle/working
!rm -rf mmagic
!git clone https://github.com/open-mmlab/mmagic.git
%cd mmagic
!pip3 install -e . -v

!python -c "import mmagic; print(mmagic.__version__)"

No error in installation.

But, I'm getting the error when calling !python3 tools/train.py "configs/edsr/edsr_x2c64b16_1xb16-300k_UCMerced.py" --auto-scale-lr Here is the stack trace cutted from last calls

after printing logs, it first shows some warnings

/opt/conda/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.5
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/__init__.py:98: UserWarning: unable to load libtensorflow_io_plugins.so: unable to open file: libtensorflow_io_plugins.so, from paths: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl6StatusC1EN10tensorflow5error4CodeESt17basic_string_viewIcSt11char_traitsIcEENS_14SourceLocationE']
  warnings.warn(f"unable to load libtensorflow_io_plugins.so: {e}")
/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/__init__.py:104: UserWarning: file system plugins are not loaded: unable to open file: libtensorflow_io.so, from paths: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZTVN10tensorflow13GcsFileSystemE']
  warnings.warn(f"file system plugins are not loaded: {e}")

and then an error

...
...
...
 /opt/conda/lib/python3.10/site-packages/mmcv/utils/ext_loader.py:13 in       │
│ load_ext                                                                     │
│                                                                              │
│   10 if torch.__version__ != 'parrots':                                      │
│   11 │                                                                       │
│   12 │   def load_ext(name, funcs):                                          │
│ ❱ 13 │   │   ext = importlib.import_module('mmcv.' + name)                   │
│   14 │   │   for fun in funcs:                                               │
│   15 │   │   │   assert hasattr(ext, fun), f'{fun} miss in module {name}'    │
│   16 │   │   return ext                                                      │
│                                                                              │
│ /opt/conda/lib/python3.10/importlib/__init__.py:126 in import_module         │
│                                                                              │
│   123 │   │   │   if character != '.':                                       │
│   124 │   │   │   │   break                                                  │
│   125 │   │   │   level += 1                                                 │
│ ❱ 126 │   return _bootstrap._gcd_import(name[level:], package, level)        │
│   127                                                                        │
│   128                                                                        │
│   129 _RELOADING = {}    
ImportError: 
/opt/conda/lib/python3.10/site-packages/mmcv/_ext.cpython-310-x86_64-linux-gnu.s
o: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationESs

Any solution to the problem or clue to debug will be highly helpful and appreciated. Thank you. Same code works fine in Colab.

zengyh1900 commented 11 months ago

hi @MasterHM-ml , it seems that mmcv was not successfully installed. Can you reinstall mmcv again and check whether it is installed successfully? You may refer https://mmcv.readthedocs.io/en/latest/get_started/installation.html# to install mmcv

MasterHM-ml commented 11 months ago

Hello, @zengyh1900 - thanks for the update. But I installed the mmcv according to the official documentation guidelines. Here is the gist to see a complete detailed stack trace.

zengyh1900 commented 11 months ago

hi @zhouzaida I think the error comes from https://gist.github.com/MasterHM-ml/619dee045ce44c5184cd93cb833328b1#file-gistfile1-txt-L1120 , where the codes try to import ops from mmcv. Is it caused by installing the wrong version of mmcv in different platform? Do you have any ideas?

MasterHM-ml commented 11 months ago

Any update?

tomarvimal commented 10 months ago

I am also facing the same issue!

uniyushu commented 10 months ago

Try mmagic docker ?

image

or maybe it cause by pytorch 2.x version try 1.x conda install pytorch=1.10

VadimShabashov commented 1 month ago

For those who are still struggling to install and use mmcv. I tried the officially recommended approach (https://mmcv.readthedocs.io/zh-cn/latest/get_started/installation.html#install-with-mim-recommended) as well as the instruction from this comment (https://github.com/open-mmlab/mmdetection/issues/10401#issuecomment-1627394117). They didn't work for me. However, I noticed that there is no error when running in a CPU-only regime on Kaggle. So, I suspected there might be conflicts with the latest CUDA (I had CUDA 12.1 in my environment). After I downgraded CUDA (downgraded by finding an old notebook with a pinned environment) to 11.3, everything started to work. Here is a notebook with the pinned environment (CUDA 11.3), where no errors appear in mmcv: https://www.kaggle.com/code/vadimshabashov/mmdetection-startup-on-kaggle?scriptVersionId=180583679