open-mmlab / mmtracking

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.
https://mmtracking.readthedocs.io/en/latest/
Apache License 2.0
3.52k stars 591 forks source link

How to fine-tune? #796

Open ZJZhao123 opened 1 year ago

ZJZhao123 commented 1 year ago

How to fine-tune with the model in model zoo?

dyhBUPT commented 1 year ago

load the pre-trained ckpt -> train it on your dataset (generally with a small lr)

ZJZhao123 commented 1 year ago

Thank you for your answer. I want to use the model of bytetrack, and fine tune it in QDtrack. Because the model of bytetrack just trained a detector, the config code is as follows(I also modified the code in mmtrack). The code can run, but the result used the bytetrack model is similar to not use it.(I tried to put the init_cfg in the dict of detector or in the dict of model with bytetrack model, they all didn't work) `base = [ '../../base/models/yolox_x_8x8.py', '../../base/default_runtime.py' ] img_scale = (800, 1440) samples_per_gpu = 1 model = dict( type='zzjQDTrack', detector=dict( input_size=img_scale, random_size_range=(18, 32), bbox_head=dict(num_classes=1), test_cfg=dict(score_thr=0.01, nms=dict(type='nms', iou_threshold=0.7)) , init_cfg=dict( type='Pretrained', checkpoint= # noqa: E251

'/home/zzj/mmtracking/checkpoint/latest.pth'

        #'/home/zzj/mmtracking/checkpoint/yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth'
        #'/home/zzj/mmtracking/checkpoint/bytetrack_yolox_x_crowdhuman_mot17-private-half_20211218_205500-1985c9f0.pth'
        'https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_x_8x8_300e_coco/yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth'
        # noqa: E501
    )
    ),
init_cfg=dict(
        type='Pretrained',
        checkpoint=  # noqa: E251
        #'/home/zzj/mmtracking/checkpoint/latest.pth'
        #'/home/zzj/mmtracking/checkpoint/yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth'
        '/home/zzj/mmtracking/resultfile/latest.pth'
        #'https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_x_8x8_300e_coco/yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth'
        # noqa: E501
    ),
track_head=dict(
    type='QuasiDenseTrackHead',
    # init_cfg=dict(
    #     type='Pretrained',
    #     checkpoint=  # noqa: E251
    #     #'/home/zzj/mmtracking/checkpoint/latest.pth'
    #     #'/home/zzj/mmtracking/checkpoint/yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth'
    #     #'/home/zzj/mmtracking/checkpoint/bytetrack_yolox_x_crowdhuman_mot17-private-half_20211218_205500-1985c9f0.pth'
    #     #'https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_x_8x8_300e_coco/yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth'
    #     # noqa: E501
    #     '/home/zzj/mmtracking/checkpoint/qdtrack_faster-rcnn_r50_fpn_4e_crowdhuman_mot17_20220315_163453-68899b0a.pth'
    # ),
    roi_extractor=dict(
        type='SingleRoIExtractor',
        roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
        out_channels=320,
        #featmap_strides=[4, 8, 16, 32]
        featmap_strides=[4, 8, 16]),
    embed_head=dict(
        type='QuasiDenseEmbedHead',
        num_convs=4,
        num_fcs=1,
        embed_channels=320,
        norm_cfg=dict(type='GN', num_groups=32),
        loss_track=dict(type='MultiPosCrossEntropyLoss', 
        loss_weight=0.0025
        # loss_weight=0.25
        ),
        loss_track_aux=dict(
            type='L2Loss',
            neg_pos_ub=3,
            pos_margin=0,
            neg_margin=0.1,
            hard_mining=True,
            loss_weight=0.001
            # loss_weight=1.0
            )),
    loss_bbox=dict(type='L1Loss', loss_weight=1.0),
    train_cfg=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.7,
            neg_iou_thr=0.5,
            min_pos_iou=0.5,
            match_low_quality=False,
            ignore_iof_thr=-1),
        sampler=dict(
            type='CombinedSampler',
            num=256,
            pos_fraction=0.5,
            neg_pos_ub=3,
            add_gt_as_proposals=True,
            pos_sampler=dict(type='InstanceBalancedPosSampler'),
            neg_sampler=dict(type='RandomSampler')))),
motion=dict(type='KalmanFilter'),
tracker=dict(
    type='zzjByteTracker',
    nms_backdrop_iou_thr=0.3,
    nms_class_iou_thr=0.7,
    obj_score_thr=0.5,
    obj_score_thrs=dict(high=0.6, low=0.1),
    init_track_thr=0.7,
    weight_iou_with_det_scores=True,
    match_iou_thrs=dict(high=0.1, low=0.5, tentative=0.3),
    num_frames_retain=30))

optimizer = dict( type='SGD', lr=0.0000625 / 2 * samples_per_gpu, momentum=0.9, weight_decay=5e-4, nesterov=True, paramwise_cfg=dict(norm_decay_mult=0.0, bias_decay_mult=0.0)) optimizer_config = dict(grad_clip=None)

some hyper parameters

total_epochs = 80 num_last_epochs = 10 resume_from = None interval = 5

learning policy

lr_config = dict( policy='YOLOX', warmup='exp', by_epoch=False, warmup_by_epoch=True, warmup_ratio=1, warmup_iters=1, num_last_epochs=num_last_epochs, min_lr_ratio=0.05)

custom_hooks = [ dict( type='YOLOXModeSwitchHook', num_last_epochs=num_last_epochs, priority=48), dict( type='SyncNormHook', num_last_epochs=num_last_epochs, interval=interval, priority=48), dict( type='ExpMomentumEMAHook', resume_from=resume_from, momentum=0.0001, priority=49) ]

checkpoint_config = dict(interval=1) evaluation = dict(metric=['bbox', 'track'], interval=1) search_metrics = ['MOTA', 'IDF1', 'FN', 'FP', 'IDs', 'MT', 'ML']

you need to set mode='dynamic' if you are using pytorch<=1.5.0

fp16 = dict(loss_scale=dict(init_scale=512.))

optimizer && learning policy

optimizer_config = dict(

delete=True, grad_clip=dict(max_norm=35, norm_type=2))

evaluation = dict(metric=['bbox', 'track'], interval=1) `

dyhBUPT commented 1 year ago

Hi, you mean, you change the detector of QDTrack from Faster R-CNN to YOLOX, and want to use the pre-trained weights of ByteTrack.

Emmmm, I'm not sure, but I think it's normal that you get the similar results of "using or not using the weights of ByteTrack", because they use the same training data.

ZJZhao123 commented 1 year ago

Thank you for your answer. When we trained bytetrack, we just traind the detector Yolox,so I think the model of bytetrack is a better model of yolox in fact. I feel confused of the same result when I changed the detector of QDTrack from Faster R-CNN to YOLOX, and used the pre-trained weights of ByteTrack. I also used to use the model of QDTrack and changed the tracker to Bytetrack. The results is worse than used the model of bytetrack. I think this is because yolox itself is better than faster rcnn. So I want to change the detector of QDTrack from Faster R-CNN to YOLOX for experiment. I think the reason why bytetrack works better than qdtrack is that it uses a better detector. If the detector is replaced by qdtrack, the effect will be better than that of bytetrack. Do you have any suggestions? Whether the code of my model loading method in the last question is wrong? I would appreciate it if you could give me some suggestions.

ZJZhao123 commented 1 year ago

How should the Bytetrack model be loaded into the QDtrack? I feel that my code may have errors. How to simultaneously use the bytetrack model as the weight of the detector and the QDtrack model as the weight of the tracking feature matching part?

ZJZhao123 commented 1 year ago

993eaeb160429b21a63293c325c1f5c

dyhBUPT commented 1 year ago

You mean, you train "Faster R-CNN+QDTrack" and "YOLOX+QDTrack", and they have similar tracking performance. That is, your "YOLOX+QDTrack" has normal tracking results? If so, I don't think there are big errors.

I'm also confused that "YOLOX doesn't perform better than Faster R-CNN" in your experiments. I'm sorry but I can't give a definite answer. Maybe caused by some unsuitable parameters?

ZJZhao123 commented 1 year ago

Thank you for your comment. I'm sorry I didn't make it clear.

For the first problem, I mean, when I used the bytetrack pre-trained ckpt+qdtrack and yolox original pre-trained ckpt+qdtrack, the initial training effect is equivalent, but theoretically, the bytetrack model should actually be a yolox model with better pedestrian detection effect after training. The reason why I used it this way is that I think the bytetrack training process only trained the yolox detector, so its model can directly replace the yolox detector model.

For the second problem, I used the qdtrack pre-trained ckpt(Faster R-CNN detector) + bytetrack tracker, the result is worse than qdtrack pre-trained ckpt(Faster R-CNN detector) + QDtrack tracker. Therefore, I think the effect of bytetrack tracker is not as good as that of qdtrack tracker. The reason why bytetrack is better than the qdtrack model is that bytetrack uses yolox. This is why I want to use yolox+qdtrack tracker together. When using yolox, I don't want to use yolox's pre-trained ckpt in mmdetection, but rather use the pre-trained ckpt file trained by bytetrack. I don't know how to use it...

dyhBUPT commented 1 year ago

Hi, thanks for your clarification.

I guess that your "bytetrack model" is not loaded. You can debug to see if it is correctly loaded.

If not, a simple way to fix it is to convert "bytetrack model" to the format of "YOLOX model" and load it in the "model -> detector -> init_cfg", not "model -> init_cfg". For convenience, you can get this model from StrongSORT: https://download.openmmlab.com/mmtracking/mot/strongsort/mot_dataset/yolox_x_crowdhuman_mot17-private-half_20220812_192036-b6c9ce9a.pth

I hope this will help. Best wishes.