twangnh / Distilling-Object-Detectors-Shuffledet

Implementations of CVPR 2019 paper Distilling Object Detectors with Fine-grained Feature Imitation
Other
27 stars 9 forks source link

🔥Updating🔥

TODO

Preparation

python2 tensorpack=0.8.6 tensorflow=1.8.0

1 Clone the repository

First of all, clone the code

git clone https://github.com/twangnh/Distilling-Object-Detectors-Shuffledet

2 Data preparation

Note we split KITTI training set into train/val sets and eval our methods and models on val set, since test set label is not available. KITTI 2D object detection images are sampled from video, randomly split the training data into train and val set could lead to exceptionally high performance due to correlation between video frames, we follow MSCNN [1], Zhaowei Cai et.al. which split KIITI training set to train/val sets while ensuring images frames does not come from close video frames.

3 download imagenet pretrained model and trained 1x teacher model

Train

we have migrated to multi-gpu training with cross gpu batch normalization, currently batch size of 32 on 4 GPUs is reported, other settings could be tried.

models will be saved in train_dir

Evaluation

By default, the evaluation code runs while training progress, test all checkpoint saved, after training has started, e.g., the 0.5x student training, you can run

python eval_model.py --dataset KITTI --net ShuffleDet_conv1_stride1 --eval_dir /path_to/eval_dir --image_set val --gpu 0 --checkpoint_path /path_to/train_dir --student 0.5

Then, tensorboard records can be loaded as(change port if needed)

tensorboard --logdir=/path_to/eval_dir --port 4118

and viewed by opening the site

http://localhost:4118

Models Flops
/G
Params
/M
car pedestrian cyclist mAP ckpt
Easy Mod Hard Easy Mod Hard Easy Mod Hard
1x 5.1 1.6 85.7 74.3 65.8 63.2 55.6 50.6 69.7 51.0 49.1 62.8 GoogleDrive
0.5x 1.5 0.53 81.6 71.7 61.2 59.4 52.3 45.5 59.7 43.5 42.0 57.4 GoogleDrive
0.5x-I 1.5 0.53 84.9 72.9 64.1 60.7 53.3 47.2 69.0 46.2 44.9 60.4 GoogleDrive
+3.3 +1.2 +2.9 +1.3 +1.0 +1.7 +9.3 +2.7 +2.9 +3.0
0.25x 0.67 0.21 67.2 56.6 47.5 54.7 48.4 42.1 49.1 33.3 32.9 48.0 GoogleDrive
0.25x-I 0.67 0.21 76.6 62.3 54.6 56.8 48.2 42.6 56.6 37.3 36.5 52.4 GoogleDrive
+9.4 +5.7 +7.1 +2.1 -0.2 +0.5 +7.5 +4.0 +3.6 +4.4

models with highest mAP are reported for both baseline and distilled model

Note the numbers are different from the paper as they are independent running of the algorithm and we have migrated from single GPU training to multi-gpu training with larger batch size.

Test with trained model

python eval_model.py --dataset KITTI --net ShuffleDet_conv1_stride1 --eval_dir xxx --image_set val --gpu 0 --checkpoint_path /path_to/model0.5x60.4/model.ckpt-33000 --run_once True --student 0.5
python eval_model.py --dataset KITTI --net ShuffleDet_conv1_stride1_supervisor --eval_dir xxx --image_set val --gpu 0 --checkpoint_path ./kitti-1x-supervisor/model.ckpt-725000 --run_once True

Parameter counts

Note for model size, tensorflow saved checkpoint contains gradients/other information, so the size is larger than it should be, we have not yet freeze the model, to check model size, for exampel, the baseline 0.25x model without imitation, run

python param_count.py --model_path /home/wangtao/prj/shuffledet-multi-gpu-ckpt/model0.25x_nosup_48.0/model.ckpt-40000

Flops counts

Still to come...

Trouble shooting

Citation

@inproceedings{wang2019distilling,
  title={Distilling Object Detectors With Fine-Grained Feature Imitation},
  author={Wang, Tao and Yuan, Li and Zhang, Xiaopeng and Feng, Jiashi},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={4933--4942},
  year={2019}
}

Reference

[1] Zhaowei Cai, Quanfu Fan, Rogerio S Feris, and Nuno Vasconcelos. A unified multi-scale deep convolutional neural network for fast object detection. ECCV 2016

License

The code and the models are MIT licensed, as found in the LICENSE file.