openvinotoolkit / training_extensions

Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
https://openvinotoolkit.github.io/training_extensions/
Apache License 2.0
1.14k stars 443 forks source link

NNCF Optimization fails for HRNet #2124

Closed nikita-savelyevv closed 1 year ago

nikita-savelyevv commented 1 year ago

Describe the bug

Running quantization optimization for HRNet fails with the error: AttributeError: 'NNCFNetwork' object has no attribute 'set_step_params'. Full log is below.

This reproduces on the current develop (0f7461b375569dd8434f43898385c93318dda550). And quantization finishes correctly on the releases/1.2.1 branch.

Steps to Reproduce

  1. Train HRNet-s with:
    python /home/nsavel/workspace/training_extensions/otx/cli/tools/train.py /home/nsavel/workspace/training_extensions/otx/algorithms/segmentation/configs/ocr_lite_hrnet_s_mod2/template.yaml --train-data-roots /home/nsavel/workspace/training_extensions/tests/assets/common_semantic_segmentation_dataset/train --val-data-roots /home/nsavel/workspace/training_extensions/tests/assets/common_semantic_segmentation_dataset/val -o /home/nsavel/workspace/models/hrnet/s
  2. Run quantization:
    python /home/nsavel/workspace/training_extensions/otx/cli/tools/optimize.py /home/nsavel/workspace/training_extensions/otx/algorithms/segmentation/configs/ocr_lite_hrnet_s_mod2/template.yaml --train-data-roots /home/nsavel/workspace/training_extensions/tests/assets/common_semantic_segmentation_dataset/train --val-data-roots /home/nsavel/workspace/training_extensions/tests/assets/common_semantic_segmentation_dataset/val --load-weights /home/nsavel/workspace/models/hrnet/s_24/models/weights.pth -o /home/nsavel/workspace/models/hrnet/s_24/qat

Environment:

nikita-savelyevv commented 1 year ago
Traceback (most recent call last):
  File "/home/nsavel/workspace/training_extensions/otx/cli/tools/optimize.py", line 167, in <module>
    main()
  File "/home/nsavel/workspace/training_extensions/otx/cli/tools/optimize.py", line 125, in main
    task.optimize(
  File "/home/nsavel/workspace/training_extensions/otx/algorithms/common/tasks/nncf_task.py", line 288, in optimize
    results = self._optimize(dataset, optimization_parameters)
  File "/home/nsavel/workspace/training_extensions/otx/algorithms/segmentation/adapters/mmseg/nncf/task.py", line 72, in _optimize
    results = self._train_model(dataset)
  File "/home/nsavel/workspace/training_extensions/otx/algorithms/segmentation/adapters/mmseg/task.py", line 373, in _train_model
    adapt_batch_size(train_func, cfg, datasets, isinstance(self, NNCFBaseTask))  # nncf needs eval hooks
  File "/home/nsavel/workspace/training_extensions/otx/algorithms/common/adapters/mmcv/utils/automatic_bs.py", line 87, in adapt_batch_size
    available_bs = adapt_torch_model_bs(
  File "/home/nsavel/workspace/training_extensions/otx/algorithms/common/adapters/torch/utils/bs_search_algo.py", line 49, in adapt_batch_size
    train_func(current_bs)
  File "/home/nsavel/workspace/training_extensions/otx/algorithms/common/adapters/mmcv/utils/automatic_bs.py", line 79, in train_func_single_iter
    train_func(
  File "/home/nsavel/venvs/otx_clean/lib/python3.8/site-packages/mmseg/apis/train.py", line 194, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "/home/nsavel/workspace/training_extensions/otx/algorithms/common/adapters/mmcv/nncf/runners.py", line 118, in run
    model = acc_aware_training_loop.run(
  File "/home/nsavel/venvs/otx_clean/lib/python3.8/site-packages/nncf/common/accuracy_aware_training/training_loop.py", line 86, in run
    return self._run_early_exit_training_loop(model)
  File "/home/nsavel/venvs/otx_clean/lib/python3.8/site-packages/nncf/common/accuracy_aware_training/training_loop.py", line 107, in _run_early_exit_training_loop
    self.runner.train_epoch(model, self.compression_controller)
  File "/home/nsavel/venvs/otx_clean/lib/python3.8/site-packages/nncf/common/accuracy_aware_training/runner.py", line 234, in train_epoch
    self.current_loss = self._train_epoch_fn(compression_controller,
  File "/home/nsavel/workspace/training_extensions/otx/algorithms/common/adapters/mmcv/nncf/runners.py", line 137, in train_fn
    self.train(self._train_data_loader)
  File "/home/nsavel/workspace/training_extensions/otx/algorithms/common/adapters/mmcv/runner.py", line 80, in train
    self.call_hook("before_train_iter")
  File "/home/nsavel/venvs/otx_clean/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 317, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/nsavel/workspace/training_extensions/otx/algorithms/common/adapters/mmcv/hooks/lr_updater_hook.py", line 106, in before_train_iter
    self._init_states(runner)
  File "/home/nsavel/workspace/training_extensions/otx/algorithms/common/adapters/mmcv/hooks/lr_updater_hook.py", line 142, in _init_states
    super()._init_states(runner)
  File "/home/nsavel/workspace/training_extensions/otx/algorithms/common/adapters/mmcv/hooks/lr_updater_hook.py", line 101, in _init_states
    runner.model.module.set_step_params(runner.iter, self.epoch_len)
  File "/home/nsavel/venvs/otx_clean/lib/python3.8/site-packages/nncf/torch/nncf_network.py", line 377, in __getattr__
    return get_nn_module_attr(self, name)
  File "/home/nsavel/venvs/otx_clean/lib/python3.8/site-packages/nncf/torch/nncf_network.py", line 369, in get_nn_module_attr
    return super().__getattr__(name)
  File "/home/nsavel/venvs/otx_clean/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1269, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'NNCFNetwork' object has no attribute 'set_step_params'
goodsong81 commented 1 year ago

Thank you for the report. This merge-back of releases/1.2.1 and this PR will resolve the issue: https://github.com/openvinotoolkit/training_extensions/pull/2112