Learning to Discriminate Information for Online Action Detection, CVPR 2020
Hyunjun Eun, Jinyoung Moon, Jongyoul Park, Chanho Jung, Changick Kim
[arXiv
]
This is official implementation of Information Discrimintation Units (IDU). For online action detectoin, we investigate on the question of "how RNNs can learn to explicitly discriminate relevant information from irrelevant information for detecting actions in the present". To this end, we propose a novel recurrent unit that extends GRU with a mechanism utilizing current information and an early embedding module. We perform extensive experiments on two benchmark datasets, where our Information Discrimination Networks (IDN) achieve stateof-the-art performances of 86.1% mcAP and 60.3% mAP on TVSeries and THUMOS-14, respectively.
14 May, 2020: Initial update
We provide three patch files in the 'patch' folder. Replace original files in tensorflow to these files. Each folder name in the 'patch' folder describes the directory where files need to be located in.
The code for training is not included in this repository, and we cannot release the full training code since the research is involved in the project funded.
Download our extracted features here. Files should be located in 'data/'.
On both TVSeries and THUMOS-14 datasets, we extract video frames at 24 fps and set the number of frames in each chunk N to 6. We use 16 chunks (i.e., T=15), which are 4 seconds long, for the input of IDN. We use a two-stream network as a features extractor. In the two-stream network, one stream encodes appearance information by taking the center frame of a chunk as input, while another stream encodes motion information by processing an optical flow stack computed from an input chunk. Among several two-stream networks, we employ the TSN models pretrained on ActivityNet-v1.3 and Kinetics datasets.
Download our trained models here. Files should be located in '{dataset name}/models/'.
Dataset | Feature | mcAP (mAP) |
---|---|---|
TVSeries | TwoStream-incepv3 | 86.1 |
TVSeries | TwoStream-anet2016 | 84.7 |
THUMOS-14 | TwoStream-incepv3 | 60.3 |
THUMOS-14 | TwoStream-anet2016 | 50.0 |
*Please refer our paper for more results.
For TVSeries
python tvseries/test.py --feat_type incepv3
python tvseries/test.py --feat_type anet2016
For THUMOS-14
python thumos14/test.py --feat_type incepv3
python thumos14/test.py --feat_type anet2016
Please cite our paper in your publications if it helps your research:
@article{eun2020idu,
title={Learning to Discriminate Information for Online Action Detection},
author={Eun, Hyunjun and Moon, Jinyoung and Park, Jongyoul and Jung, Chanho and Kim, Changick},
booktitle={CVPR},
year={2020}
}