microsoft / SoftTeacher

Semi-Supervised Learning, Object Detection, ICCV2021
MIT License
899 stars 123 forks source link

Support 'save_best' argument for Evaluation hook #173

Open Tim-Hung opened 2 years ago

Tim-Hung commented 2 years ago

Thanks for the great works. I meet some question for evaluation hook.

I want to save the best checkpoint for my model during the training, so I add the argument in config file.

The original evaluation hook in config file is:

evaluation = dict(type="SubModulesDistEvalHook", interval=50)

I add the argument for evaluation hook as below:

evaluation = dict(type="SubModulesDistEvalHook",save_best='auto',interval=50)

However, I get the error as below:

2022-03-08 16:33:40,867 - mmcv - INFO - Reducer buckets have been rebuilt in this iteration.
2022-03-08 16:34:08,135 - mmdet.ssod - INFO - Saving checkpoint at 50 iterations
2022-03-08 16:34:09,793 - mmdet.ssod - INFO - Iter [50/150]     lr: 9.890e-04, eta: 0:01:02, time: 0.628, data_time: 0.032, memory: 6611, ema_momentum: 0.9800, sup_loss_rpn_cls: 0.4482, sup_loss_rpn_bbox: 0.1047, sup_loss_cls: 1.3773, sup_acc: 84.2721, sup_loss_bbox: 0.0614, unsup_loss_rpn_cls: 0.9392, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 2.1935, unsup_acc: 86.6482, unsup_loss_bbox: 0.0000, loss: 5.1243
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 49.0 task/s, elapsed: 102s, ETA:     0s2022-03-08 16:35:54,296 - mmdet.ssod - INFO - Evaluating bbox...
Loading and preparing results...
2022-03-08 16:35:54,298 - mmdet.ssod - ERROR - The testing results of the whole dataset is empty.
Traceback (most recent call last):
  File "tools/train.py", line 201, in <module>
    main()
  File "tools/train.py", line 189, in main
    train_detector(
  File "/home/SoftTeacher/ssod/apis/train.py", line 206, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/.local/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 144, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/home/.local/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 67, in train
    self.call_hook('after_train_iter')
  File "/home/.local/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 309, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/SoftTeacher/ssod/utils/hooks/submodules_evaluation.py", line 37, in after_train_iter
    self._do_evaluate(runner)
  File "/home/SoftTeacher/ssod/utils/hooks/submodules_evaluation.py", line 82, in _do_evaluate
    key_score = self.evaluate(runner, results, prefix=submodule)
  File "/home/SoftTeacher/ssod/utils/hooks/submodules_evaluation.py", line 118, in evaluate
    self._init_rule(self.rule, list(eval_res.keys())[0])
IndexError: list index out of range

Does this SubModulesDistEvalHook support save_best mode?

Do you mind pointing me that where I should fix this? Thanks.

phelps-matthew commented 2 years ago

Unfortunately I do not think mmdet supports this - I also wanted this feature and looked for it.

See https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py#L10

Tim-Hung commented 2 years ago

Unfortunately I do not think mmdet supports this - I also wanted this feature and looked for it.

See https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py#L10

Hi @phelps-matthew , thanks for your response.

I think this feature is supported in EvalHook. Please see https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/evaluation.py#L33

In Soft Teacher project, the author write the new SubModulesDistEvalHook inherited from 2 files, mmdet.core.evaluation.eval_hooks.py - DistEvalHook & mmcv.runner.hooks.evaluation.py - DistEvalHook

So I think it should support save_best mode and I was missing something to use this feature.