open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
28.95k stars 9.36k forks source link

KeyError: TensorboardLoggerHook is not in the hook registry (and any other hook). Hydra | Custom pipeline #10207

Open ihoholko opened 1 year ago

ihoholko commented 1 year ago

Describe the bug

I am developing project based on MMdetection. I am trying to use customised configuration. And it works well, the training process is good. But when I am trying to add custom_hooks I am getting some errors.

The project idea (config & train loop):

# @package _global_

# to execute this experiment run:
# python train.py experiment=example
default:
  - data: mmdet/aimosaic-seg.yaml

# all parameters below will be merged with parameters from default configurations set above
# this allows you to overwrite only specified parameters

_base_: ${paths.mmdet_dir}/configs/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py

custom_hooks:

  - type: TextLoggerHook
  - type: TensorboardLoggerHook
  - type: MlflowLoggerHook
    exp_name: ${name}
    tags: ${tags}
    log_model: True
    interval: 1
    ignore_last: True
    reset_flag: True

# We also need to change the num_classes in head to match the dataset's annotation
model:
  roi_head:
    bbox_head:
      num_classes: 2
    mask_head:
      num_classes: 2

tags: ["ai-mosaic", "mask-rcnn"]

seed: 12345

Training code:


from typing import Dict, Tuple
import logging

from mmengine.config import Config, DictAction
from mmengine.logging import print_log
from mmengine.registry import RUNNERS
from mmengine.runner import Runner
from omegaconf import DictConfig, OmegaConf

from mmdet.utils import setup_cache_size_limit_of_dynamo

def train_mmdet(cfg: DictConfig) -> Dict[str, float]:

    # Reduce the number of repeated compilations and improve
    # training speed.
    setup_cache_size_limit_of_dynamo()

    # load config
    mm_conf: Config = Config.fromfile(cfg._base_)
    mm_conf.merge_from_dict( OmegaConf.to_container(cfg, resolve=True) )

    mm_conf.work_dir = cfg.paths.work_dir

    # build the runner from config
    if 'runner_type' not in mm_conf:
        # build the default runner
        runner = Runner.from_cfg(mm_conf)
    else:
        # build customized runner from the registry
        # if 'runner_type' is set in the cfg
        runner = RUNNERS.build(mm_conf)

    # start training
    runner.train()

I installed mmdet as submodule and as a package. In both scenarios I can't use hooks. Getting following errors:

    mmdet_dir='/home/oem/Projects/aimosaic-roi/3rd/mmdetection')
extras = dict(ignore_warnings=False, enforce_tags=True, print_config=True)
default = [dict(data='mmdet/aimosaic-seg.yaml')]
custom_hooks = [
    dict(type='TensorboardLoggerHook'),
    dict(
        type='MlflowLoggerHook',
        exp_name='X',
        tags=['ai-mosaic', 'mask-rcnn'],
        log_model=True,
        interval=1,
        ignore_last=True,
        reset_flag=True)
]
work_dir = '/home/oem/Projects/aimosaic-roi'

04/22 21:00:28 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
Error executing job with overrides: ['experiment=mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.yaml', 'train_dataloader.batch_size=20', 'name=X']
Traceback (most recent call last):
  File "/home/oem/Projects/aimosaic-roi/training/run.py", line 52, in main
    metric_dict = train(cfg)
  File "/home/oem/Projects/aimosaic-roi/training/mmdet/train_mmdet.py", line 39, in train_mmdet
    runner = Runner.from_cfg(mm_conf)
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/runner/runner.py", line 439, in from_cfg
    runner = cls(
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/runner/runner.py", line 419, in __init__
    self.register_hooks(default_hooks, custom_hooks)
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/runner/runner.py", line 1924, in register_hooks
    self.register_custom_hooks(custom_hooks)
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/runner/runner.py", line 1905, in register_custom_hooks
    self.register_hook(hook)
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/runner/runner.py", line 1806, in register_hook
    hook_obj = HOOKS.build(hook)
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/registry/registry.py", line 545, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/registry/build_functions.py", line 100, in build_from_cfg
    raise KeyError(
KeyError: 'TensorboardLoggerHook is not in the hook registry. Please check whether the value of `TensorboardLoggerHook` is correct or it was registered as expected. More details can be found at https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#import-the-custom-module'

Reproduction

  1. Did you make any modifications on the code or config? Did you understand what you have modified?

As described before. I rewrote everything to support yaml configuration. Info above.

  1. What dataset did you use?

Custom.

Environment

python 3rd/mmdetection/mmdet/utils/collect_env.py

sys.platform: linux
Python: 3.9.13 (main, Aug 25 2022, 23:26:10) [GCC 11.2.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 4090
CUDA_HOME: /usr/local/cuda-11.2
NVCC: Cuda compilation tools, release 11.2, V11.2.67
GCC: gcc (Ubuntu 10.4.0-4ubuntu1~22.04) 10.4.0
PyTorch: 2.0.0+cu117
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.7
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.5
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

OpenCV: 4.7.0
MMEngine: 0.7.2
MMDetection: 3.0.0+a3c87ab
PYTHONPATH=.:3rd/mmdetection:3rd/ultralytics:3rd/mmsegmentation:3rd/sahi
ihoholko commented 1 year ago

Okay, fixed with W&B and Tensor-board by changing the structure of config. But have some troubles with MLflowVisBackend yet. Loggers are not custom_hooks as I got:

├── visualizer
│   └── type: mmdet.DetLocalVisualizer                                                                                          
│       name: visualizer                                                                                                        
│       vis_backends:                                                                                                           
│       - type: LocalVisBackend                                                                                                 
│       - type: TensorboardVisBackend                                                                                           
│       - type: MLflowVisBackend                                                                                                
│         save_dir: null                                                                                                        
│         exp_name: Mosaic-roi                                                                                                  
│         run_name: mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco                                                                     
│         tags:                                                                                                                 
│         - ai-mosaic                                                                                                           
│         - mask-rcnn                                                                                                           
│         params: null                                                                                                          
│         tracking_uri: http://mlflow:                                                    
│         artifact_suffix:                                                                                                      
│         - .json                                                                                                               
│         - .log                                                                                                                
│         - .py                                                                                                                 
│         - yaml                                                                                                                
│                                                                            

-->

env_cfg = dict(
    cudnn_benchmark=False,
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
    dist_cfg=dict(backend='nccl'))
vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(
    type='DetLocalVisualizer',
    vis_backends=[
        dict(type='LocalVisBackend'),
        dict(type='TensorboardVisBackend'),
        dict(
            type='MLflowVisBackend',
            save_dir=None,
            exp_name='Mosaic-roi',
            run_name='mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco',
            tags=['ai-mosaic', 'mask-rcnn'],
            params=None,
            tracking_uri='http://',
            artifact_suffix=['.json', '.log', '.py', 'yaml'])
    ],
    name='visualizer')

But still have a problem with Mlflow in this case:

Error executing job with overrides: ['experiment=mmdet/mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.yaml', 'name=Mosaic-roi']
Traceback (most recent call last):
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/registry/build_functions.py", line 119, in build_from_cfg
    obj = obj_cls.get_instance(**args)  # type: ignore
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/visualization/visualizer.py", line 1165, in get_instance
    instance = super().get_instance(name, **kwargs)
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/utils/manager.py", line 110, in get_instance
    instance = cls(name=name, **kwargs)  # type: ignore
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmdet/visualization/local_visualizer.py", line 88, in __init__
    super().__init__(
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/visualization/visualizer.py", line 202, in __init__
    self._vis_backends[name] = VISBACKENDS.build(vis_backend)
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/registry/registry.py", line 545, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/registry/build_functions.py", line 100, in build_from_cfg
    raise KeyError(
KeyError: 'MLflowVisBackend is not in the vis_backend registry. Please check whether the value of `MLflowVisBackend` is correct or it was registered as expected. More details can be found at https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#import-the-custom-module'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/oem/Projects/aimosaic-roi/training/run.py", line 53, in main
    metric_dict = train(cfg)
  File "/home/oem/Projects/aimosaic-roi/training/mmdet/train_mmdet.py", line 46, in train_mmdet
    runner = Runner.from_cfg(mm_conf)
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/runner/runner.py", line 439, in from_cfg
    runner = cls(
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/runner/runner.py", line 393, in __init__
    self.visualizer = self.build_visualizer(visualizer)
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/runner/runner.py", line 780, in build_visualizer
    return VISUALIZERS.build(visualizer)
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/registry/registry.py", line 545, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/home/oem/Projects/aimosaic-roi/venv2/lib/python3.9/site-packages/mmengine/registry/build_functions.py", line 135, in build_from_cfg
    raise type(e)(
KeyError: "class `DetLocalVisualizer` in mmdet/visualization/local_visualizer.py: 'MLflowVisBackend is not in the vis_backend registry. Please check whether the value of `MLflowVisBackend` is correct or it was registered as expected. More details can be found at https://mmengine.readthedocs.io/en/latest/advanced_tutorials/config.html#import-the-custom-module'"

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.