This repository holds the codes and models for the paper
End-to-end Video-level Representation Learning for Action Recognition, Jiagang Zhu, Wei Zou, Zheng Zhu, ICPR 2018, Beijing, China.
We follow the guidance provided by TSN to prepare the data. Please refer to the TSN repository for guidance. Here we only provide the additional training details of DTPP.
Use git to clone this repository and its submodules
git clone --recursive https://github.com/zhujiagang/DTPP.git
cd lib/
cp -r caffe-tpp-net/ caffe-tpp-net-python/
Please compile caffe-tpp-net/
with cmake and openmpi according to TSN for training models
and compile caffe-tpp-net-python/
with python interface for evaluating models with python script.
We have built the initialization model weights for both rgb and flow input. The flow initialization models implements the cross-modality training technique in the paper. To download the model weights, run
bash get_init_models.sh
bash get_kinetics_pretraining_models.sh
Once all necessities ready, we can start training DTPP. For example, if we want to train on HMDB51. For example, the following command runs training on HMDB51 with rgb input, with its weights initialized by ImageNet pretraining.
bash hmdb_scripts_split_1/train_rgb_tpp_delete_dropout_split_1.sh
And the following command runs training on HMDB51 with rgb input, with its weights initialized by Kinetics pretraining.
bash kinetics_hmdb_split_1/train_kinetics_rgb_tpp_p124_split_1.sh
The learned model weights will be saved in snapshot/
.
The reader can refer to the
eval_tpp_net_ucf.py
eval_tpp_net_hmdb.py
and modify the path in the files to test the trained models.
For the fusion of two streams and MIFS, iDT, please refer to the
eval_scores_rgb_flow.py
Our trained models have been released in [BaiduYun], with extraction password: fuuq
.
Please cite the following paper if you feel this repository useful.
@inproceedings{DTPP2018ICPR,
author = {Jiagang Zhu and
Wei Zou and
Zheng Zhu},
title = {End-to-end Video-level Representation Learning for Action Recognition},
booktitle = {ICPR},
year = {2018},
}
@inproceedings{TSN2016ECCV,
author = {Limin Wang and
Yuanjun Xiong and
Zhe Wang and
Yu Qiao and
Dahua Lin and
Xiaoou Tang and
Luc {Val Gool}},
title = {Temporal Segment Networks: Towards Good Practices for Deep Action Recognition},
booktitle = {ECCV},
year = {2016},
}
For any question, please contact
Jiagang Zhu: zhujiagang2015@ia.ac.cn