twangnh/Distilling-Object-Detectors-Shuffledet

🔥Updating🔥

TODO

Still to come:
- [ ] Separate sub losses to print. Add visualization of detection output.
- [ ] Combining the proposed method with model pruning/quantization method.

Preparation

python2 tensorpack=0.8.6 tensorflow=1.8.0

1 Clone the repository

First of all, clone the code

git clone https://github.com/twangnh/Distilling-Object-Detectors-Shuffledet

2 Data preparation

Note we split KITTI training set into train/val sets and eval our methods and models on val set, since test set label is not available. KITTI 2D object detection images are sampled from video, randomly split the training data into train and val set could lead to exceptionally high performance due to correlation between video frames, we follow MSCNN [1], Zhaowei Cai et.al. which split KIITI training set to train/val sets while ensuring images frames does not come from close video frames.

download images at http://www.cvlibs.net/download.php?file=data_object_label_2.zip and extract into ./data/KITTI/training/image_2/
download labels at http://www.cvlibs.net/download.php?file=data_object_label_2.zip and extract into ./data/KITTI/training/label_2/
The train val split image index files are ready at ./data/KITTI/ImageSets

3 download imagenet pretrained model and trained 1x teacher model

download imagenet pretrained 0.5x modified-shufflenet backbone model at 0.5x GoogleDrive and put it into ./pretrained_model/shuffle_backbone0.5x/
download imagenet pretrained 0.25x modified-shufflenet backbone model at 0.25x GoogleDrive and put it into ./pretrained_model/shuffle_backbone0.25x/
download trained 1x model at GoogleDrive and put it into ./kitti-1x-supervisor/

Train

we have migrated to multi-gpu training with cross gpu batch normalization, currently batch size of 32 on 4 GPUs is reported, other settings could be tried.

train with 0.5x student

python train_multi_gpu.py --dataset KITTI --net ShuffleDet_conv1_stride1 --student 0.5 --train_dir xxx --image_set train --pretrained_model_path ./pretrained_model/shuffle_backbone0.5x/model-960000

train with 0.25x student

python train_multi_gpu.py --dataset KITTI --net ShuffleDet_conv1_stride1 --student 0.25 --train_dir xxx --image_set train --pretrained_model_path ./pretrained_model/shuffle_backbone0.25x/model-665000

you can turn off imitation by passing --without_imitation True, then the training is only with ground truth supervision, like

python train_multi_gpu.py --dataset KITTI --net ShuffleDet_conv1_stride1 --student 0.5 --train_dir xxx --image_set train --pretrained_model_path ./pretrained_model/shuffle_backbone0.5x/model-960000 --without_imitation True

models will be saved in train_dir

Evaluation

By default, the evaluation code runs while training progress, test all checkpoint saved, after training has started, e.g., the 0.5x student training, you can run

python eval_model.py --dataset KITTI --net ShuffleDet_conv1_stride1 --eval_dir /path_to/eval_dir --image_set val --gpu 0 --checkpoint_path /path_to/train_dir --student 0.5

Then, tensorboard records can be loaded as(change port if needed)

tensorboard --logdir=/path_to/eval_dir --port 4118

and viewed by opening the site

http://localhost:4118

Models	Flops /G	Params /M	car			pedestrian			cyclist			mAP	ckpt
Models	Flops /G	Params /M	Easy	Mod	Hard	Easy	Mod	Hard	Easy	Mod	Hard	mAP	ckpt
1x	5.1	1.6	85.7	74.3	65.8	63.2	55.6	50.6	69.7	51.0	49.1	62.8	GoogleDrive
0.5x	1.5	0.53	81.6	71.7	61.2	59.4	52.3	45.5	59.7	43.5	42.0	57.4	GoogleDrive
0.5x-I	1.5	0.53	84.9	72.9	64.1	60.7	53.3	47.2	69.0	46.2	44.9	60.4	GoogleDrive
0.5x-I	1.5	0.53	+3.3	+1.2	+2.9	+1.3	+1.0	+1.7	+9.3	+2.7	+2.9	+3.0
0.25x	0.67	0.21	67.2	56.6	47.5	54.7	48.4	42.1	49.1	33.3	32.9	48.0	GoogleDrive
0.25x-I	0.67	0.21	76.6	62.3	54.6	56.8	48.2	42.6	56.6	37.3	36.5	52.4	GoogleDrive
0.25x-I	0.67	0.21	+9.4	+5.7	+7.1	+2.1	-0.2	+0.5	+7.5	+4.0	+3.6	+4.4

models with highest mAP are reported for both baseline and distilled model

Note the numbers are different from the paper as they are independent running of the algorithm and we have migrated from single GPU training to multi-gpu training with larger batch size.

Test with trained model

For example, to test the 0.5x distilled model download the trained model at the corresponding GoogleDrive link, then run

python eval_model.py --dataset KITTI --net ShuffleDet_conv1_stride1 --eval_dir xxx --image_set val --gpu 0 --checkpoint_path /path_to/model0.5x60.4/model.ckpt-33000 --run_once True --student 0.5

To test the 1x supervisor model, run

python eval_model.py --dataset KITTI --net ShuffleDet_conv1_stride1_supervisor --eval_dir xxx --image_set val --gpu 0 --checkpoint_path ./kitti-1x-supervisor/model.ckpt-725000 --run_once True

Parameter counts

Note for model size, tensorflow saved checkpoint contains gradients/other information, so the size is larger than it should be, we have not yet freeze the model, to check model size, for exampel, the baseline 0.25x model without imitation, run

python param_count.py --model_path /home/wangtao/prj/shuffledet-multi-gpu-ckpt/model0.25x_nosup_48.0/model.ckpt-40000

Flops counts

Still to come...

Trouble shooting

if you got permission denied when eval the model, please try
```
chmod +x ./dataset_tool/kitti-eval/cpp/evaluate_object
```
if not work, just compile the evaluate_object excutable from source, i.e., run make under ./dataset_tool/kitti-eval

Citation

@inproceedings{wang2019distilling,
  title={Distilling Object Detectors With Fine-Grained Feature Imitation},
  author={Wang, Tao and Yuan, Li and Zhang, Xiaopeng and Feng, Jiashi},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={4933--4942},
  year={2019}
}

Reference

[1] Zhaowei Cai, Quanfu Fan, Rogerio S Feris, and Nuno Vasconcelos. A unified multi-scale deep convolutional neural network for fast object detection. ECCV 2016

License

The code and the models are MIT licensed, as found in the LICENSE file.