open-mmlab / mmtracking

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.
https://mmtracking.readthedocs.io/en/latest/
Apache License 2.0
3.52k stars 591 forks source link

selsa troialign frame sampling range #338

Closed Chop1 closed 2 years ago

Chop1 commented 2 years ago

For validation and testing, you use the method 'test_with_adaptive_stride'), which is coherent with the paper. In the paper, test is done by sampling on the whole video

The uniform sampling strategy with T = 14 is used as the default setting in the following experiments

Looking at RoI Align config (https://github.com/open-mmlab/mmtracking/blob/master/configs/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid.py), I dont understand why you put frame_range=[-7, 7]. To my understanding, you are sampling 7 frames before and 7 frames after the target frame.

    val=dict(
        ref_img_sampler=dict(
            _delete_=True,
            num_ref_imgs=14,
            frame_range=[-7, 7],
            method='test_with_adaptive_stride')),
GT9505 commented 2 years ago

The frame_range is a little confusing when method='test_with_adaptive_stride'.

When method='test_with_fix_stride', it means that we sampling 7 frames before and 7 frames after the target frame.

While when method='test_with_adaptive_stride', the frame_range is only used for compute the number of reference images (i.e., num_ref_images). Total num_ref_images frames will be sampled from the whole video in the setting.

Please see the code of ref_img_sampling for more details.