open-mmlab / mmsegmentation

OpenMMLab Semantic Segmentation Toolbox and Benchmark.
https://mmsegmentation.readthedocs.io/en/main/
Apache License 2.0
8k stars 2.57k forks source link

KeyError: 'data_time' #1502

Open cmcamdy opened 2 years ago

cmcamdy commented 2 years ago

When I choose IterBasedRunner in my schedule_cfg, this bug regular presence. And I have tried to follow https://github.com/open-mmlab/mmcv/pull/1252 to fix this, but it does not work. And I have check the file in mmseg/apis/train.py, EvalHook is already set LOW. Here is the trackback

Traceback (most recent call last):
  File "tools/train.py", line 177, in <module>
    main()
  File "tools/train.py", line 173, in main
    meta=meta)
  File "/home/chengsiyuan/code/omai/mae_segmentation/mmcv_custom/train_api.py", line 132, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 134, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 67, in train
    self.call_hook('after_train_iter')
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 309, in call_hook
    getattr(hook, fn_name)(self)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/hooks/logger/base.py", line 153, in after_train_iter
    self.log(runner)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/hooks/logger/text.py", line 234, in log
    self._log_info(log_dict, runner)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/hooks/logger/text.py", line 153, in _log_info
    log_str += f'time: {log_dict["time"]:.3f}, ' \
KeyError: 'data_time'

And my Hooks level is

before_run:
(VERY_HIGH   ) PolyLrUpdaterHook                  
(ABOVE_NORMAL) DistOptimizerHook                  
(NORMAL      ) CheckpointHook                     
(NORMAL      ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) PolyLrUpdaterHook                  
(NORMAL      ) EvalHook                           
(LOW         ) IterTimerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_iter:
(VERY_HIGH   ) PolyLrUpdaterHook                  
(NORMAL      ) EvalHook                           
(LOW         ) IterTimerHook                      
 -------------------- 
after_train_iter:
(ABOVE_NORMAL) DistOptimizerHook                  
(NORMAL      ) CheckpointHook                     
(NORMAL      ) EvalHook                           
(LOW         ) IterTimerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) CheckpointHook                     
(NORMAL      ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_epoch:
(LOW         ) IterTimerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_epoch:
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
after_run:
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 

Is there another way that might work?

MeowZheng commented 2 years ago

From your hook level, the priority of evalhook is NORMAL. Please double-check your code for the training launch.

hasayake007 commented 2 years ago

have you solved it?

cmcamdy commented 2 years ago

have you solved it?

I do not solved this problem, because I do not sure where the priority set I should modify. But I try to modify the schedule config file to avoid this situation, eg.

checkpoint_config = dict(by_epoch=False, interval=4000) evaluation = dict(interval=4001, metric='mIoU')

MeowZheng commented 2 years ago

The level of eval hook is low, like

(VERY_HIGH   ) PolyLrUpdaterHook
(NORMAL      ) CheckpointHook
(LOW         ) DistEvalHook
(VERY_LOW    ) TextLoggerHook
 --------------------
before_train_epoch:
(VERY_HIGH   ) PolyLrUpdaterHook
(LOW         ) IterTimerHook
(LOW         ) DistEvalHook
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_iter:
(VERY_HIGH   ) PolyLrUpdaterHook                
(LOW         ) IterTimerHook                      
(LOW         ) DistEvalHook                       
 --------------------
after_train_iter:
(ABOVE_NORMAL) OptimizerHook                      
(NORMAL      ) CheckpointHook                     
(LOW         ) IterTimerHook                      
(LOW         ) DistEvalHook                       
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) CheckpointHook                     
(LOW         ) DistEvalHook                       
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_epoch:
(LOW         ) IterTimerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_iter:
(LOW         ) IterTimerHook    
DoranLyong commented 1 year ago

Hi, I want to share the solution of this issue.

In mmseg/apis/train.py, plus priority='NORMAL'

so that

runner.register_hook(eval_hook(val_dataloader, save_best='mIoU', **eval_cfg), priority='NORMAL')

Screenshot from 2023-02-11 17-54-52