microsoft / FairMOT

This project provides an official implementation of our recent work on real-time multi-object tracking in videos. The previous works conduct object detection and tracking with two separate models so they are very slow. In contrast, we propose a one-stage solution which does detection and tracking with a single network by elegantly solving the alignment problem. The resulting approach achieves groundbreaking results in terms of both accuracy and speed: (1) it ranks first among all the trackers on the MOT challenges; (2) it is significantly faster than the previous state-of-the-arts. In addition, it scales gracefully to handle a large number of objects.
MIT License
163 stars 21 forks source link

FairMOT

This is the official implementation for:

A Simple Baseline for Multi-Object Tracking,
Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenjun Zeng, Wenyu Liu,
arXiv technical report (arXiv 2004.01888)

Abstract

There has been remarkable progress on object detection and association in recent years which are the core components for multi-object tracking. However, little attention has been focused on accomplishing the two tasks in a single network to improve the inference speed. The initial attempts along this path ended up with degraded results mainly because the association branch is not appropriately learned. In this work, we study the essential reasons behind the failure, and accordingly present a simple baseline to addresses the problems. It remarkably outperforms the state-of-the-arts on the MOT challenge datasets at 30 FPS. We hope this baseline could inspire and help evaluate new ideas in this field.

Tracking performance

Results on MOT challenge test set

Dataset MOTA IDF1 IDS MT ML FPS
2DMOT15 59.0 62.2 582 45.6% 11.5% 30.5
MOT16 68.7 70.4 953 39.5% 19.0% 25.9
MOT17 67.5 69.8 2868 37.7% 20.8% 25.9
MOT20 58.7 63.7 6013 66.3% 8.5% 13.2

All of the results are obtained on the MOT challenge evaluation server under the “private detector” protocol. We rank first among all the trackers on 2DMOT15, MOT17 and the recently released (2020.02.29) MOT20. Note that our IDF1 score remarkably outperforms other one-shot MOT trackers by more than 10 points. The tracking speed of the entire system can reach up to 30 FPS.

Video demos on MOT challenge test set

Installation

Data preparation

We use the same training data as JDE. Please refer to their DATA ZOO to download and prepare all the training data including Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16.

2DMOT15 and MOT20 can be downloaded from the official webpage of MOT challenge. After downloading, you should prepare the data in the following structure:

MOT15
   |——————images
   |        └——————train
   |        └——————test
   └——————labels_with_ids
            └——————train(empty)
MOT20
   |——————images
   |        └——————train
   |        └——————test
   └——————labels_with_ids
            └——————train(empty)

Then, you can change the seq_root and label_root in src/gen_labels_15.py and src/gen_labels_20.py and run:

cd src
python gen_labels_15.py
python gen_labels_20.py

to generate the labels of 2DMOT15 and MOT20. The seqinfo.ini files of 2DMOT15 can be downloaded here [Google], [Baidu],code:8o0w.

Pretrained models and baseline model

DLA-34 COCO pretrained model: DLA-34 official. HRNetV2 ImageNet pretrained model: HRNetV2-W18 official, HRNetV2-W32 official. After downloading, you should put the pretrained models in the following structure:

${FAIRMOT_ROOT}
   └——————models
           └——————ctdet_coco_dla_2x.pth
           └——————hrnetv2_w32_imagenet_pretrained.pth
           └——————hrnetv2_w18_imagenet_pretrained.pth

Our baseline FairMOT model can be downloaded here: DLA-34: [Google] [Baidu, code: 88yn]. HRNetV2_W18: [Google] [Baidu, code: 7jb1]. After downloading, you should put the baseline model in the following structure:

${FAIRMOT_ROOT}
   └——————models
           └——————all_dla34.pth
           └——————all_hrnet_v2_w18.pth
           └——————...

Training

Tracking

Demo

You can input a raw video and get the demo video by running src/demo.py and get the mp4 format of the demo video:

cd src
python demo.py mot --load_model ../models/all_dla34.pth --conf_thres 0.4

You can change --input-video and --output-root to get the demos of your own videos.

If you have difficulty building DCNv2 and thus cannot use the DLA-34 baseline model, you can run the demo with the HRNetV2_w18 baseline model:

cd src
python demo.py mot --load_model ../models/all_hrnet_v2_w18.pth --arch hrnet_18 --reid_dim 128 --conf_thres 0.4

--conf_thres can be set from 0.3 to 0.7 depending on your own videos.

Citation

@article{zhang2020simple,
  title={A Simple Baseline for Multi-Object Tracking},
  author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu},
  journal={arXiv preprint arXiv:2004.01888},
  year={2020}
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.