open-mmlab / mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
https://mmaction2.readthedocs.io
Apache License 2.0
4.21k stars 1.23k forks source link

[Bug] ImportError #2617

Closed Ishihara-Masabumi closed 1 year ago

Ishihara-Masabumi commented 1 year ago

Branch

main branch (1.x version, such as v1.0.0, or dev-1.x branch)

Prerequisite

Environment

python 3.8 mmaction2 1.1.0 mmcv 2.0.1

Describe the bug

When I run a training script, the following error occurred.

Reproduces the problem - code sample

No response

Reproduces the problem - command or script

python tools/train.py configs/recognition/r2plus1d/r2plus1d_r34_8xb8-8x8x1-180e_kinetics400-rgb.py --seed=0 --deterministic

Reproduces the problem - error message

(openmmlab) dl@dl-machine:~/mmaction2/mmaction2$ python tools/train.py configs/recognition/r2plus1d/r2plus1d_r34_8xb8-8x8x1-180e_kinetics400-rgb.py \

--seed=0 --deterministic

08/01 15:50:57 - mmengine - INFO -

System environment: sys.platform: linux Python: 3.8.17 (default, Jul 5 2023, 21:04:15) [GCC 11.2.0] CUDA available: True numpy_random_seed: 0 GPU 0: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.6, V11.6.124 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 PyTorch: 1.8.0+cu111 PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  • CuDNN 8.0.5
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.9.0+cu111
OpenCV: 4.8.0
MMEngine: 0.8.3

Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: 0 diff_rank_seed: False deterministic: True Distributed launcher: none Distributed training: False GPU number: 1

08/01 15:50:57 - mmengine - INFO - Config: ann_file_test = 'data/kinetics400/kinetics400_val_list_videos.txt' ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt' ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt' auto_scale_lr = dict(base_batch_size=64, enable=False) data_root = 'data/kinetics400/videos_train' data_root_val = 'data/kinetics400/videos_val' dataset_type = 'VideoDataset' default_hooks = dict( checkpoint=dict( interval=1, max_keep_ckpts=3, save_best='auto', type='CheckpointHook'), logger=dict(ignore_last=False, interval=20, type='LoggerHook'), param_scheduler=dict(type='ParamSchedulerHook'), runtime_info=dict(type='RuntimeInfoHook'), sampler_seed=dict(type='DistSamplerSeedHook'), sync_buffers=dict(type='SyncBuffersHook'), timer=dict(type='IterTimerHook')) default_scope = 'mmaction' env_cfg = dict( cudnn_benchmark=False, dist_cfg=dict(backend='nccl'), mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0)) file_client_args = dict(io_backend='disk') launcher = 'none' load_from = None log_level = 'INFO' log_processor = dict(by_epoch=True, type='LogProcessor', window_size=20) model = dict( backbone=dict( conv1_kernel=( 3, 7, 7, ), conv1_stride_t=1, conv_cfg=dict(type='Conv2plus1d'), depth=34, inflate=( 1, 1, 1, 1, ), norm_cfg=dict(eps=0.001, requires_grad=True, type='SyncBN'), norm_eval=False, pool1_stride_t=1, pretrained=None, pretrained2d=False, spatial_strides=( 1, 2, 2, 2, ), temporal_strides=( 1, 2, 2, 2, ), type='ResNet2Plus1d', zero_init_residual=False), cls_head=dict( average_clips='prob', dropout_ratio=0.5, in_channels=512, init_std=0.01, num_classes=400, spatial_type='avg', type='I3DHead'), data_preprocessor=dict( format_shape='NCTHW', mean=[ 123.675, 116.28, 103.53, ], std=[ 58.395, 57.12, 57.375, ], type='ActionDataPreprocessor'), type='Recognizer3D') optim_wrapper = dict( clip_grad=dict(max_norm=40, norm_type=2), optimizer=dict(lr=0.01, momentum=0.9, type='SGD', weight_decay=0.0001)) param_scheduler = [ dict(T_max=180, by_epoch=True, eta_min=0, type='CosineAnnealingLR'), ] randomness = dict(deterministic=True, diff_rank_seed=False, seed=0) resume = False test_cfg = dict(type='TestLoop') test_dataloader = dict( batch_size=1, dataset=dict( ann_file='data/kinetics400/kinetics400_val_list_videos.txt', data_prefix=dict(video='data/kinetics400/videos_val'), pipeline=[ dict(io_backend='disk', type='DecordInit'), dict( clip_len=8, frame_interval=8, num_clips=10, test_mode=True, type='SampleFrames'), dict(type='DecordDecode'), dict(scale=( -1, 256, ), type='Resize'), dict(crop_size=256, type='ThreeCrop'), dict(input_format='NCTHW', type='FormatShape'), dict(type='PackActionInputs'), ], test_mode=True, type='VideoDataset'), num_workers=8, persistent_workers=True, sampler=dict(shuffle=False, type='DefaultSampler')) test_evaluator = dict(type='AccMetric') test_pipeline = [ dict(io_backend='disk', type='DecordInit'), dict( clip_len=8, frame_interval=8, num_clips=10, test_mode=True, type='SampleFrames'), dict(type='DecordDecode'), dict(scale=( -1, 256, ), type='Resize'), dict(crop_size=256, type='ThreeCrop'), dict(input_format='NCTHW', type='FormatShape'), dict(type='PackActionInputs'), ] train_cfg = dict( max_epochs=180, type='EpochBasedTrainLoop', val_begin=1, val_interval=20) train_dataloader = dict( batch_size=8, dataset=dict( ann_file='data/kinetics400/kinetics400_train_list_videos.txt', data_prefix=dict(video='data/kinetics400/videos_train'), pipeline=[ dict(io_backend='disk', type='DecordInit'), dict( clip_len=8, frame_interval=8, num_clips=1, type='SampleFrames'), dict(type='DecordDecode'), dict(scale=( -1, 256, ), type='Resize'), dict(type='RandomResizedCrop'), dict(keep_ratio=False, scale=( 224, 224, ), type='Resize'), dict(flip_ratio=0.5, type='Flip'), dict(input_format='NCTHW', type='FormatShape'), dict(type='PackActionInputs'), ], type='VideoDataset'), num_workers=8, persistent_workers=True, sampler=dict(shuffle=True, type='DefaultSampler')) train_pipeline = [ dict(io_backend='disk', type='DecordInit'), dict(clip_len=8, frame_interval=8, num_clips=1, type='SampleFrames'), dict(type='DecordDecode'), dict(scale=( -1, 256, ), type='Resize'), dict(type='RandomResizedCrop'), dict(keep_ratio=False, scale=( 224, 224, ), type='Resize'), dict(flip_ratio=0.5, type='Flip'), dict(input_format='NCTHW', type='FormatShape'), dict(type='PackActionInputs'), ] val_cfg = dict(type='ValLoop') val_dataloader = dict( batch_size=8, dataset=dict( ann_file='data/kinetics400/kinetics400_val_list_videos.txt', data_prefix=dict(video='data/kinetics400/videos_val'), pipeline=[ dict(io_backend='disk', type='DecordInit'), dict( clip_len=8, frame_interval=8, num_clips=1, test_mode=True, type='SampleFrames'), dict(type='DecordDecode'), dict(scale=( -1, 256, ), type='Resize'), dict(crop_size=224, type='CenterCrop'), dict(input_format='NCTHW', type='FormatShape'), dict(type='PackActionInputs'), ], test_mode=True, type='VideoDataset'), num_workers=8, persistent_workers=True, sampler=dict(shuffle=False, type='DefaultSampler')) val_evaluator = dict(type='AccMetric') val_pipeline = [ dict(io_backend='disk', type='DecordInit'), dict( clip_len=8, frame_interval=8, num_clips=1, test_mode=True, type='SampleFrames'), dict(type='DecordDecode'), dict(scale=( -1, 256, ), type='Resize'), dict(crop_size=224, type='CenterCrop'), dict(input_format='NCTHW', type='FormatShape'), dict(type='PackActionInputs'), ] vis_backends = [ dict(type='LocalVisBackend'), ] visualizer = dict( type='ActionVisualizer', vis_backends=[ dict(type='LocalVisBackend'), ]) work_dir = './work_dirs/r2plus1d_r34_8xb8-8x8x1-180e_kinetics400-rgb'

/home/dl/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/cnn/bricks/transformer.py:33: UserWarning: Fail to import MultiScaleDeformableAttention from mmcv.ops.multi_scale_deform_attn, You should install mmcv rather than mmcv-lite if you need this module. warnings.warn('Fail to import MultiScaleDeformableAttention from ' 08/01 15:51:01 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used. Traceback (most recent call last): File "tools/train.py", line 135, in main() File "tools/train.py", line 128, in main runner = Runner.from_cfg(cfg) File "/home/dl/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 445, in from_cfg runner = cls( File "/home/dl/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 414, in init self.model = self.wrap_model( File "/home/dl/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 864, in wrap_model model = revert_sync_batchnorm(model) File "/home/dl/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/utils.py", line 174, in revert_sync_batchnorm from mmcv.ops import SyncBatchNorm File "/home/dl/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/init.py", line 2, in from .active_rotated_filter import active_rotated_filter File "/home/dl/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/ops/active_rotated_filter.py", line 10, in ext_module = ext_loader.load_ext( File "/home/dl/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/utils/ext_loader.py", line 13, in load_ext ext = importlib.import_module('mmcv.' + name) File "/home/dl/miniconda3/envs/openmmlab/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) ImportError: /home/dl/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c104impl8GPUTrace13gpuTraceStateE

Additional information

No response

Dai-Wenxun commented 1 year ago

You should install mmcv rather than mmcv-lite if you need this module.

Ishihara-Masabumi commented 1 year ago

I have already installed mmcv. mmcv 2.0.1

cir7 commented 1 year ago

Looks like mmcv and PyTorch versions are not compatible, please try to install mmcv with mim.

ZhihuaGao commented 6 months ago

thr problem is still there, how to resolve it?

ZhihuaGao commented 6 months ago

use min install solve the problem, the strange thing is that I use pip install the same version but got diffrent result