timmeinhardt / trackformer

Implementation of "TrackFormer: Multi-Object Tracking with Transformers”. [Conference on Computer Vision and Pattern Recognition (CVPR), 2022]
https://arxiv.org/abs/2101.02702
Apache License 2.0
511 stars 115 forks source link

Camera shift #12

Open boza-wd opened 3 years ago

boza-wd commented 3 years ago

Hi,

thanks for your amazing work!

I have noticed that the results on flashmob.mp4 are really good and robust until there is a camera shift, it seems that the reid is getting lost. (frames 222, and 223).

000222 000223

I have used the demo:

python src/track.py with \
    reid \
    dataset_name=DEMO \
    data_root_dir=data/flashmob \
    output_dir=data/flashmob

Could it be that some of the parameters in cfgs/track.yaml should be updated:

...
    public_detections: False
    # score threshold for detections
    detection_obj_score_thresh: 0.9
    # score threshold for keeping the track alive
    track_obj_score_thresh: 0.8
    # NMS threshold for detection
    detection_nms_thresh: 0.9
    # NMS theshold while tracking
    track_nms_thresh: 0.9
    # motion model settings
    # How many timesteps inactive tracks are kept and cosidered for reid
    inactive_patience: -1
    # How similar do image and old track need to be to be considered the same person
    reid_sim_threshold: 0.2
    reid_sim_only: false
    reid_score_thresh: 0.8
    reid_greedy_matching: false

Cheers!

boza-wd commented 3 years ago

Also is there a way to display the segmentation within the demo?! Thx

timmeinhardt commented 3 years ago

Is flashmap.mp4 your file or from any of the presented datasets?

If the video sequence contains drastic camera changes or very low frame rates this indeed can pose a challenging tracking situation which might lead to loss of tracks. You could try setting the track_obj_score_thresh to a lower value or play around with the re-identification parameters. But this is something you have to try and figure out until it works best for your unique sequence.

Segmentation masks are only predicted if you load a model which was trained for that task. See our pretrained MOTS20 models.