open-mmlab / mmcv

OpenMMLab Computer Vision Foundation
https://mmcv.readthedocs.io/en/latest/
Apache License 2.0
5.92k stars 1.66k forks source link

AttributeError: 'Live' object has no attribute 'set_step' #2562

Open Divergense opened 1 year ago

Divergense commented 1 year ago

Thanks for reporting the unexpected results and we appreciate it a lot.

Checklist

  1. [x] I have searched related issues but cannot get the expected help.
  2. [x] I have read the FAQ documentation but cannot get the expected help.
  3. [x] The unexpected results still exist in the latest version.

Describe the Issue

MMCV hook DvcliveLoggerHook has a bug. The class placed at mmcv/runner/hooks/logger/dvclive.py uses wrong (old) API of DVCLive library (1.3.2 is current version). Exactly the method log(self, runner) uses self.dvclive.set_step and self.dvclive.log methods. These methods don't exist in present DVCLive version.

Reproduction

  1. What command, code, or script did you run?

    train_detector(model, datasets, cfg, distributed=False, validate=True)

    Note: the code fully corresponds to the mmdetection/demo/MMDet_Tutorial.ipynb guide (v2.27.0).

  2. Did you make any modifications on the code? Did you understand what you have modified?

    I tried to adopt MMDetection with DVC (tracking experiment metrics) and added DvcliveLoggerHook like this:

    cfg.log_config.hooks = [
            dict(type='TextLoggerHook'),
            dict(type='TensorboardLoggerHook'), 
            dict(type='DvcliveLoggerHook', report="auto"),
            ]

Environment

  1. Please run python -c "from mmcv.utils import collect_env; print(collect_env())" to collect necessary environment information and paste it here:

     'CUDA available': False,
     'GCC': 'x86_64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0',
     'MMCV': '1.7.0',
     'MMCV CUDA Compiler': 'not available',
     'MMCV Compiler': 'GCC 7.5',
     'OpenCV': '4.6.0',
     'PyTorch': '1.13.0+cu116',
     'PyTorch compiling details': 'PyTorch built with:\n'
                                  '  - GCC 9.3\n'
                                  '  - C++ Version: 201402\n'
                                  '  - Intel(R) Math Kernel Library Version '
                                  '2020.0.0 Product Build 20191122 for Intel(R) 64 '
                                  'architecture applications\n'
                                  '  - Intel(R) MKL-DNN v2.6.0 (Git Hash '
                                  '52b5f107dd9cf10910aaa19cb47f3abf9b349815)\n'
                                  '  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
                                  '  - LAPACK is enabled (usually provided by '
                                  'MKL)\n'
                                  '  - NNPACK is enabled\n'
                                  '  - CPU capability usage: AVX2\n'
                                  '  - Build settings: BLAS_INFO=mkl, '
                                  'BUILD_TYPE=Release, CUDA_VERSION=11.6, '
                                  'CUDNN_VERSION=8.3.2, '
                                  'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
                                  'CXX_FLAGS= -fabi-version=11 -Wno-deprecated '
                                  '-fvisibility-inlines-hidden -DUSE_PTHREADPOOL '
                                  '-fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM '
                                  '-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK '
                                  '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
                                  '-DEDGE_PROFILER_USE_KINETO -O2 -fPIC '
                                  '-Wno-narrowing -Wall -Wextra '
                                  '-Werror=return-type -Werror=non-virtual-dtor '
                                  '-Wno-missing-field-initializers '
                                  '-Wno-type-limits -Wno-array-bounds '
                                  '-Wno-unknown-pragmas -Wunused-local-typedefs '
                                  '-Wno-unused-parameter -Wno-unused-function '
                                  '-Wno-unused-result -Wno-strict-overflow '
                                  '-Wno-strict-aliasing '
                                  '-Wno-error=deprecated-declarations '
                                  '-Wno-stringop-overflow -Wno-psabi '
                                  '-Wno-error=pedantic -Wno-error=redundant-decls '
                                  '-Wno-error=old-style-cast '
                                  '-fdiagnostics-color=always -faligned-new '
                                  '-Wno-unused-but-set-variable '
                                  '-Wno-maybe-uninitialized -fno-math-errno '
                                  '-fno-trapping-math -Werror=format '
                                  '-Werror=cast-function-type '
                                  '-Wno-stringop-overflow, LAPACK_INFO=mkl, '
                                  'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
                                  'PERF_WITH_AVX512=1, TORCH_VERSION=1.13.0, '
                                  'USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, '
                                  'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, '
                                  'USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, '
                                  'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n',
     'Python': '3.8.16 (default, Dec  7 2022, 01:12:13) [GCC 7.5.0]',
     'TorchVision': '0.14.0+cu116',
     'sys.platform': 'linux'

Error traceback

Traceback (most recent call last):
  File "src/train.py", line 35, in <module>
    main(args)
  File "src/train.py", line 29, in main
    train_detector(model, datasets, cfg, distributed=False, validate=True)
  File "/content/dvc-mmdetection-example/mmdetection/mmdet/apis/train.py", line 246, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
    self.call_hook('after_train_iter')
  File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/base_runner.py", line 317, in call_hook
    getattr(hook, fn_name)(self)
  File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/hooks/logger/base.py", line 158, in after_train_iter
    self.log(runner)
  File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/dist_utils.py", line 144, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/hooks/logger/dvclive.py", line 61, in log
    self.dvclive.set_step(self.get_iter(runner))
AttributeError: 'Live' object has no attribute 'set_step'

Bug fix

Since i don't know what actual behavior is expected (all DVCLive methods produce slightly different results and i was so lazy to find working version of DVCLive) i can make just assumptions. Minimal working changes for me are the following:

Later i needed more info and added the following lines in the method:

self.dvclive.make_summary()
self.dvclive.make_report()

Final changes look like this:

@master_only
def log(self, runner) -> None:
    tags = self.get_loggable_tags(runner)
    if tags:
        self.dvclive.step = self.get_iter(runner)
        for k, v in tags.items():
            self.dvclive.log_metric(k, v)
        self.dvclive.make_summary()
        self.dvclive.make_report()
zhouzaida commented 1 year ago

Hi @Divergense , thank you for your feedback, and sorry for our late reply. We were on Chinese New Year last week. It seems like the dvclive caused a bc issue. We can call different interfaces according to the version of dvclive. Are you interested in making a PR to fix this issue?

Divergense commented 1 year ago

Hi! Yes, i'm interested to make a PR but i never do it before and hence i need to learn the contributing guidelines first.

And i have a couple of questions (sorry for stupid questions):

zhouzaida commented 1 year ago