open-mmlab / mmtracking

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.
https://mmtracking.readthedocs.io/en/latest/
Apache License 2.0
3.56k stars 598 forks source link

Problems of freezing the detector params of DFF #662

Closed yan811 closed 2 years ago

yan811 commented 2 years ago

I have set "frozen_modules = ['detector']", the following error occurs:

Traceback (most recent call last): File "/home/ma-user/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg return obj_cls(**args) File "/home/ma-user/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/optim/sgd.py", line 69, in __init__ super(SGD, self).__init__(params, defaults) File "/home/ma-user/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/optim/optimizer.py", line 55, in __init__ self.add_param_group(param_group) File "/home/ma-user/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/optim/optimizer.py", line 278, in add_param_group raise ValueError("some parameters appear in more than one parameter group") ValueError: some parameters appear in more than one parameter group

How to solve it?

JingweiZhang12 commented 2 years ago

Maybe multiple parameter groups have overlap parameters. Please provide your config and describe your modification.

yan811 commented 2 years ago

While training, MMDistributedDataParallel is used. In /mmtracking/mmtrack/apis/train.py: model = MMDistributedDataParallel( model.cuda(), device_ids=[torch.cuda.current_device()], broadcast_buffers=False, find_unused_parameters=find_unused_parameters)

I can sucessfully train the DFF model while not freezing params. Here's my config modification: In mmtracking/configs/vid/dff/dff_faster_rcnn_r50_dc5_1x_imagenetvid.py: model = dict( type='DFF', detector = dict( train_cfg=dict(...), test_cfg=dict(...), init_cfg=dict( type='Pretrained', checkpoint= ... #my detector(RetinaNet) pretrained pth )), frozen_modules = ['detector'], #set the frozen modules here

motion=dict( type='FlowNetSimple', img_scale_factor=0.5, init_cfg=dict( type='Pretrained', checkpoint= .... #flownet pretrained by openmmlab
)), train_cfg=None, test_cfg=dict(key_frame_interval=10))

JingweiZhang12 commented 2 years ago

Could you provide all the error log?

yan811 commented 2 years ago

I have solved it, thanks.