open-mmlab / mmcv

OpenMMLab Computer Vision Foundation
https://mmcv.readthedocs.io/en/latest/
Apache License 2.0
5.91k stars 1.65k forks source link

[Bug] TypeError: evaluate() got an unexpected keyword argument 'type' #3065

Closed Yang-yimin closed 8 months ago

Yang-yimin commented 8 months ago

Prerequisite

Environment

conda list

Name Version Build Channel


_libgcc_mutex 0.1 main defaults _openmp_mutex 5.1 1_gnu defaults addict 2.4.0 pypi_0 pypi aliyun-python-sdk-core 2.15.0 pypi_0 pypi aliyun-python-sdk-kms 2.16.2 pypi_0 pypi appdirs 1.4.4 pypi_0 pypi blas 1.0 mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main brotli-python 1.0.9 py39h6a678d5_7 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main bzip2 1.0.8 h5eee18b_5 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main ca-certificates 2024.3.11 h06a4308_0 defaults certifi 2024.2.2 py39h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main cffi 1.16.0 pypi_0 pypi charset-normalizer 2.0.4 pyhd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main click 8.1.7 pypi_0 pypi colorama 0.4.6 pypi_0 pypi contourpy 1.2.0 pypi_0 pypi crcmod 1.7 pypi_0 pypi cryptography 42.0.5 pypi_0 pypi cuda 11.6.1 0 nvidia cuda-cccl 11.6.55 hf6102b2_0 nvidia cuda-command-line-tools 11.6.2 0 nvidia cuda-compiler 11.6.2 0 nvidia cuda-cudart 11.6.55 he381448_0 nvidia cuda-cudart-dev 11.6.55 h42ad0f4_0 nvidia cuda-cuobjdump 11.6.124 h2eeebcb_0 nvidia cuda-cupti 11.6.124 h86345e5_0 nvidia cuda-cuxxfilt 11.6.124 hecbf4f6_0 nvidia cuda-driver-dev 11.6.55 0 nvidia cuda-gdb 12.4.99 0 nvidia cuda-libraries 11.6.1 0 nvidia cuda-libraries-dev 11.6.1 0 nvidia cuda-memcheck 11.8.86 0 nvidia cuda-nsight 12.4.99 0 nvidia cuda-nsight-compute 12.4.0 0 nvidia cuda-nvcc 11.6.124 hbba6d2d_0 nvidia cuda-nvdisasm 12.4.99 0 nvidia cuda-nvml-dev 11.6.55 haa9ef22_0 nvidia cuda-nvprof 12.4.99 0 nvidia cuda-nvprune 11.6.124 he22ec0a_0 nvidia cuda-nvrtc 11.6.124 h020bade_0 nvidia cuda-nvrtc-dev 11.6.124 h249d397_0 nvidia cuda-nvtx 11.6.124 h0630a44_0 nvidia cuda-nvvp 12.4.99 0 nvidia cuda-runtime 11.6.1 0 nvidia cuda-samples 11.6.101 h8efea70_0 nvidia cuda-sanitizer-api 12.4.99 0 nvidia cuda-toolkit 11.6.1 0 nvidia cuda-tools 11.6.1 0 nvidia cuda-visual-tools 11.6.1 0 nvidia cycler 0.12.1 pypi_0 pypi cython 3.0.9 pypi_0 pypi docker-pycreds 0.4.0 pypi_0 pypi e2cnn 0.2.3 pypi_0 pypi ffmpeg 4.3 hf484d3e_0 pytorch fonttools 4.50.0 pypi_0 pypi freetype 2.12.1 h4a9f257_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main gds-tools 1.9.0.20 0 nvidia gitdb 4.0.11 pypi_0 pypi gitpython 3.1.42 pypi_0 pypi gmp 6.2.1 h295c915_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main gnutls 3.6.15 he1e5248_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main idna 3.4 py39h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main importlib-metadata 7.0.2 pypi_0 pypi importlib-resources 6.3.2 pypi_0 pypi intel-openmp 2023.1.0 hdb19cb5_46306 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main jmespath 0.10.0 pypi_0 pypi jpeg 9e h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main kiwisolver 1.4.5 pypi_0 pypi lame 3.100 h7b6447c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main lcms2 2.12 h3be6417_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main ld_impl_linux-64 2.38 h1181459_1 defaults lerc 3.0 h295c915_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libcublas 11.9.2.110 h5e84587_0 nvidia libcublas-dev 11.9.2.110 h5c901ab_0 nvidia libcufft 10.7.1.112 hf425ae0_0 nvidia libcufft-dev 10.7.1.112 ha5ce4c0_0 nvidia libcufile 1.9.0.20 0 nvidia libcufile-dev 1.9.0.20 0 nvidia libcurand 10.3.5.119 0 nvidia libcurand-dev 10.3.5.119 0 nvidia libcusolver 11.3.4.124 h33c3c4e_0 nvidia libcusparse 11.7.2.124 h7538f96_0 nvidia libcusparse-dev 11.7.2.124 hbbe9722_0 nvidia libdeflate 1.17 h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libffi 3.4.4 h6a678d5_0 defaults libgcc-ng 11.2.0 h1234567_1 defaults libgomp 11.2.0 h1234567_1 defaults libiconv 1.16 h7f8727e_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libidn2 2.3.4 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libnpp 11.6.3.124 hd2722f0_0 nvidia libnpp-dev 11.6.3.124 h3c42840_0 nvidia libnvjpeg 11.6.2.124 hd473ad6_0 nvidia libnvjpeg-dev 11.6.2.124 hb5906b9_0 nvidia libpng 1.6.39 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libstdcxx-ng 11.2.0 h1234567_1 defaults libtasn1 4.19.0 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libtiff 4.5.1 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libunistring 0.9.10 h27cfd23_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main libwebp-base 1.3.2 h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main lz4-c 1.9.4 h6a678d5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main markdown 3.6 pypi_0 pypi markdown-it-py 3.0.0 pypi_0 pypi matplotlib 3.8.3 pypi_0 pypi mdurl 0.1.2 pypi_0 pypi mkl 2023.1.0 h213fc3f_46344 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main mkl-service 2.4.0 py39h5eee18b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main mkl_fft 1.3.8 py39h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main mkl_random 1.2.4 py39hdb19cb5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main mmcv-full 1.7.2 pypi_0 pypi mmdet 2.28.2 pypi_0 pypi mmengine 0.10.3 pypi_0 pypi mmrotate 0.3.4 dev_0 model-index 0.1.11 pypi_0 pypi mpmath 1.3.0 pypi_0 pypi ncurses 6.4 h6a678d5_0 defaults nettle 3.7.3 hbbd107a_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main nsight-compute 2024.1.0.13 0 nvidia numpy 1.26.4 py39h5f9d8c6_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main numpy-base 1.26.4 py39hb5e798b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main opencv-python 4.9.0.80 pypi_0 pypi opendatalab 0.0.10 pypi_0 pypi openh264 2.1.1 h4ff587b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main openjpeg 2.4.0 h3ad879b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main openmim 0.3.9 pypi_0 pypi openssl 3.0.13 h7f8727e_0 defaults openxlab 0.0.36 pypi_0 pypi ordered-set 4.1.0 pypi_0 pypi oss2 2.17.0 pypi_0 pypi packaging 24.0 pypi_0 pypi pandas 2.2.1 pypi_0 pypi pillow 10.2.0 py39h5eee18b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main pip 23.3.1 py39h06a4308_0 defaults platformdirs 4.2.0 pypi_0 pypi prettytable 3.5.0 py39h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main protobuf 4.25.3 pypi_0 pypi psutil 5.9.8 pypi_0 pypi pycocotools 2.0.7 pypi_0 pypi pycparser 2.21 pypi_0 pypi pycryptodome 3.20.0 pypi_0 pypi pygments 2.17.2 pypi_0 pypi pyparsing 3.1.2 pypi_0 pypi pysocks 1.7.1 py39h06a4308_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main python 3.9.18 h955ad1f_0 defaults python-dateutil 2.9.0.post0 pypi_0 pypi pytorch 1.13.0 py3.9_cuda11.6_cudnn8.3.2_0 pytorch pytorch-cuda 11.6 h867d48c_1 pytorch pytorch-mutex 1.0 cuda pytorch pytz 2023.4 pypi_0 pypi pyyaml 6.0.1 pypi_0 pypi readline 8.2 h5eee18b_0 defaults requests 2.28.2 pypi_0 pypi rich 13.4.2 pypi_0 pypi scipy 1.12.0 pypi_0 pypi sentry-sdk 1.43.0 pypi_0 pypi setproctitle 1.3.3 pypi_0 pypi setuptools 60.2.0 pypi_0 pypi shapely 2.0.3 pypi_0 pypi six 1.16.0 pypi_0 pypi smmap 5.0.1 pypi_0 pypi sqlite 3.41.2 h5eee18b_0 defaults sympy 1.12 pypi_0 pypi tabulate 0.9.0 pypi_0 pypi tbb 2021.8.0 hdb19cb5_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main termcolor 2.4.0 pypi_0 pypi terminaltables 3.1.10 pypi_0 pypi tk 8.6.12 h1ccaba5_0 defaults tomli 2.0.1 pypi_0 pypi torchaudio 0.13.0 py39_cu116 pytorch torchvision 0.14.0 py39_cu116 pytorch tqdm 4.65.2 pypi_0 pypi typing_extensions 4.9.0 py39h06a4308_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main tzdata 2024.1 pypi_0 pypi urllib3 1.26.18 pypi_0 pypi wandb 0.16.4 pypi_0 pypi wcwidth 0.2.5 pyhd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main wheel 0.41.2 py39h06a4308_0 defaults xz 5.4.6 h5eee18b_0 defaults yapf 0.40.2 pypi_0 pypi zipp 3.18.1 pypi_0 pypi zlib 1.2.13 h5eee18b_0 defaults zstd 1.5.5 hc292b87_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main

pip list

Package Version Editable project location


addict 2.4.0 aliyun-python-sdk-core 2.15.0 aliyun-python-sdk-kms 2.16.2 appdirs 1.4.4 Brotli 1.0.9 certifi 2024.2.2 cffi 1.16.0 charset-normalizer 2.0.4 click 8.1.7 colorama 0.4.6 contourpy 1.2.0 crcmod 1.7 cryptography 42.0.5 cycler 0.12.1 Cython 3.0.9 docker-pycreds 0.4.0 e2cnn 0.2.3 fonttools 4.50.0 gitdb 4.0.11 GitPython 3.1.42 idna 3.4 importlib_metadata 7.0.2 importlib_resources 6.3.2 jmespath 0.10.0 kiwisolver 1.4.5 Markdown 3.6 markdown-it-py 3.0.0 matplotlib 3.8.3 mdurl 0.1.2 mkl-fft 1.3.8 mkl-random 1.2.4 mkl-service 2.4.0 mmcv-full 1.7.2 mmdet 2.28.2 mmengine 0.10.3 mmrotate 0.3.4 /exp/workspace/mmrotate model-index 0.1.11 mpmath 1.3.0 numpy 1.26.4 opencv-python 4.9.0.80 opendatalab 0.0.10 openmim 0.3.9 openxlab 0.0.36 ordered-set 4.1.0 oss2 2.17.0 packaging 24.0 pandas 2.2.1 pillow 10.2.0 pip 23.3.1 platformdirs 4.2.0 prettytable 3.5.0 protobuf 4.25.3 psutil 5.9.8 pycocotools 2.0.7 pycparser 2.21 pycryptodome 3.20.0 Pygments 2.17.2 pyparsing 3.1.2 PySocks 1.7.1 python-dateutil 2.9.0.post0 pytz 2023.4 PyYAML 6.0.1 requests 2.28.2 rich 13.4.2 scipy 1.12.0 sentry-sdk 1.43.0 setproctitle 1.3.3 setuptools 60.2.0 shapely 2.0.3 six 1.16.0 smmap 5.0.1 sympy 1.12 tabulate 0.9.0 termcolor 2.4.0 terminaltables 3.1.10 tomli 2.0.1 torch 1.13.0 torchaudio 0.13.0 torchvision 0.14.0 tqdm 4.65.2 typing_extensions 4.9.0 tzdata 2024.1 urllib3 1.26.18 wandb 0.16.4 wcwidth 0.2.5 wheel 0.41.2 yapf 0.40.2 zipp 3.18.1

Reproduces the problem - code sample

https://github.com/HamPerdredes/SOOD/blob/main/configs/ssad_fcos/base_fcos_default.py line 130


custom_hooks = [
    dict(type="NumClassCheckHook"),
    dict(type="WeightSummary"),
    dict(type="MeanTeacher", momentum=0.9996, interval=1, start_steps=100),
]
evaluation = dict(type="SubModulesDistEvalHook", interval=100, metric='mAP',
                  save_best='mAP')
# some config as in the manuscript
lr_config = dict(step=[120000, 160000])
runner = dict(_delete_=True, type="IterBasedRunner", max_iters=180000)
checkpoint_config = dict(by_epoch=False, interval=100, max_keep_ckpts=2)

# Default: disable fp16 training
# fp16 = dict(loss_scale="dynamic")

### Reproduces the problem - command or script

CUDA_VISIBLE_DEVICES=0,1  ./tools/dist_train.sh configs/ssad_fcos/sood_fcos_dota15_10per.py 2

### Reproduces the problem - error message

2024-03-22 02:37:26,694 - mmrotate - INFO - Iter [50/180000]    lr: 9.967e-04, eta: 16:56:08, time: 0.339, data_time: 0.052, memory: 4892, loss_cls_sup: 0.6242, loss_bbox_sup: 1.5073, loss_centerness_sup: 0.6730, loss: 2.8045, grad_norm: 21.0144
2024-03-22 02:37:39,594 - mmrotate - INFO - Saving checkpoint at 100 iterations
2024-03-22 02:37:43,383 - mmrotate - INFO - Start EMA Update at step 100
2024-03-22 02:37:43,389 - mmrotate - INFO - Iter [100/180000]   lr: 1.163e-03, eta: 16:48:30, time: 0.334, data_time: 0.015, memory: 4892, loss_cls_sup: 0.3682, loss_bbox_sup: 0.9200, loss_centerness_sup: 0.6579, loss: 1.9462, grad_norm: 8.8041
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 106/105, 20.6 task/s, elapsed: 5s, ETA:     0s

Traceback (most recent call last):
  File "/exp/workspace/mmrotate/./tools/train.py", line 194, in <module>
    main()
  File "/exp/workspace/mmrotate/./tools/train.py", line 183, in main
    train_detector(
  File "/exp/workspace/mmrotate/mmrotate/apis/train.py", line 144, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/root/anaconda3/envs/mmrtt/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 144, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/root/anaconda3/envs/mmrtt/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 70, in train
    self.call_hook('after_train_iter')
  File "/root/anaconda3/envs/mmrtt/lib/python3.9/site-packages/mmcv/runner/base_runner.py", line 317, in call_hook
    getattr(hook, fn_name)(self)
  File "/root/anaconda3/envs/mmrtt/lib/python3.9/site-packages/mmcv/runner/hooks/evaluation.py", line 266, in after_train_iter
    self._do_evaluate(runner)
  File "/root/anaconda3/envs/mmrtt/lib/python3.9/site-packages/mmdet/core/evaluation/eval_hooks.py", line 135, in _do_evaluate
    key_score = self.evaluate(runner, results)
  File "/root/anaconda3/envs/mmrtt/lib/python3.9/site-packages/mmcv/runner/hooks/evaluation.py", line 367, in evaluate
    eval_res = self.dataloader.dataset.evaluate(
TypeError: evaluate() got an unexpected keyword argument 'type'
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1390777 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1390776) of binary: /root/anaconda3/envs/mmrtt/bin/python
Traceback (most recent call last):
  File "/root/anaconda3/envs/mmrtt/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/anaconda3/envs/mmrtt/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/anaconda3/envs/mmrtt/lib/python3.9/site-packages/torch/distributed/launch.py", line 195, in <module>
    main()
  File "/root/anaconda3/envs/mmrtt/lib/python3.9/site-packages/torch/distributed/launch.py", line 191, in main
    launch(args)
  File "/root/anaconda3/envs/mmrtt/lib/python3.9/site-packages/torch/distributed/launch.py", line 176, in launch
    run(args)
  File "/root/anaconda3/envs/mmrtt/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/root/anaconda3/envs/mmrtt/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/root/anaconda3/envs/mmrtt/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
./tools/train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-03-22_02:37:50
  host      : 06dfad8d735d
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1390776)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

### Additional information

尊敬的作者您好:
我在尝试使用mmrotate跑通https://github.com/HamPerdredes/SOOD/中半监督学习的代码时遇到了这样的问题:模型在训练100个iter以后进行验证的时候报错,疑似是mmcv版本不匹配导致,请问可以帮我确认一下bug所在并告诉我怎样解决吗?
谢谢!
Dear author,
I'm trying to use mmrotate run through a semi-supervised learning to code in https://github.com/HamPerdredes/SOOD/ encountered such a problem: The model reported an error during verification after training 100 iter, which is suspected to be caused by the mismatch of the mmcv version. Could you please help me confirm the bug and tell me how to solve it?
Thank you!