Real-Time and Accurate Object Detection in Compressed Video by Long Short-term Feature Aggregation

Introduction

Real-Time and Accurate Object Detection in Compressed Video by Long Short-term Feature Aggregation provides a simple, fast, accurate, and end-to-end framework for video recognition (e.g., object detection and semantic segmentation in videos). It is worth noting that:

We propose a unified framework LSFA for video object detection addressing both detection accuracy and speed.
In LSFA, the short-term feature aggregation method is the first work that uses feature extracted from original image to enhance the propagated features for non-key frames.
On the challenging ImageNet VID dataset, LSFA runs in real-time (30 FPS) with detection accuracy on-par with the state-of-the-art results.

framework

Main Results

framework

For fair comparisons, all the values of testing speed are obtained using a Titan X GPU

Requirements: Software

MXNet from the offical repository. We tested our code on MXNet@(commit 75a9e187d).
Python 2.7. We recommend using Anaconda2 as it already includes many common packages. We do not suppoort Python 3 yet, if you want to use Python 3 you need to modify the code to make it work.
Python packages might missing: cython, opencv-python >= 3.2.0, easydict. If pip is set up on your system, those packages should be able to be fetched and installed by running
```
pip install Cython
pip install opencv-python==3.2.0.6
pip install easydict==1.6
```
We use ffmpeg 3.1.3 to generate mpeg4 raw videos.
We build coviar.so to load compressed representation (I-frame, motion vectors, or residual).

Requirements: Hardware

Any NVIDIA GPUs with at least 8GB memory should be OK

Installation

Clone the LSFA repository, and we'll call the directory that you cloned LSFA as ${LSFA_ROOT}.
```
git clone https://github.com/hustvl/LSFA.git
```
For Linux user, run sh ./init.sh. The scripts will build cython module automatically and create some folders.
Install MXNet:

3.1 Clone MXNet and checkout to MXNet@(commit 75a9e187d) by
```
git clone --recursive https://github.com/apache/incubator-mxnet
git checkout 75a9e187d
git submodule update
```
3.2 Copy operators in $(DFF_ROOT)/dff_rfcn/operator_cxx or $(DFF_ROOT)/rfcn/operator_cxx to $(YOUR_MXNET_FOLDER)/src/operator/contrib by
```
cp -r $(LSFA_ROOT)/dff_rfcn/operator_cxx/* $(MXNET_ROOT)/src/operator/contrib/
```
3.3 Compile MXNet
```
cd ${MXNET_ROOT}
make -j4
```
3.4 Install the MXNet Python binding by

Note: If you will actively switch between different versions of MXNet, please follow 3.5 instead of 3.4
```
cd python
sudo python setup.py install
```
3.5 For advanced users, you may put your Python packge into ./external/mxnet/$(YOUR_MXNET_PACKAGE), and modify MXNET_VERSION in ./experiments/dff_rfcn/cfgs/*.yaml to $(YOUR_MXNET_PACKAGE). Thus you can switch among different versions of MXNet quickly.

Install ffmpeg：

4.1 Clone ffmpeg and checkout to ffmpeg@(commit 74c6a6d3735f79671b177a0e0c6f2db696c2a6d2) by

git clone https://github.com/FFmpeg/FFmpeg.git
git checkout 74c6a6d3735f79671b177a0e0c6f2db696c2a6d2

4.2 Compile ffmpeg

make clean
./configure --prefix=${FFMPEG_INSTALL_PATH} --enable-pic --disable-yasm --enable-shared
make
make install

4.3 If needed, add ${FFMPEG_INSTALL_PATH}/lib/ to $LD_LIBRARY_PATH.

Build coviar_py2.so

cd $(LSFA_ROOT)/external/data_loader_py2
sh install.sh
cp ./build/lib.linux-x86_64-2.7/coviar_py2.so $(LSFA_ROOT)/lib

Preparation for Training & Testing

Please download ILSVRC2015 DET and ILSVRC2015 VID dataset, and make sure it looks like this:

./data/ILSVRC2015/
./data/ILSVRC2015/Annotations/DET
./data/ILSVRC2015/Annotations/VID
./data/ILSVRC2015/Data/DET
./data/ILSVRC2015/Data/VID
./data/ILSVRC2015/ImageSets

Use ffmpeg generate mpeg4 raw videos.

sh ./data/reencode_vid ./data/ILSVRC2015/Data/VID/snippets ./data/ILSVRC2015/Data/VID/mpeg4_snippets

For your convenience, we provide the trained models and pretrained_model from Baidu Yun (pwd:493a), and put pretrained_model under folder ./model. put the trained model under folder ./output:

Usage

All of our experiment settings (GPU #, dataset, etc.) are kept in yaml config files at folder ./experiments/{rfcn/dff_rfcn}/cfgs.
Two config files have been provided so far, namely, Frame baseline with R-FCN and LSFA with R-FCN for ImageNet VID. We use 4 GPUs to train models on ImageNet VID.
To perform experiments, run the python script with the corresponding config file as input. For example, to train and test LSFA with R-FCN, use the following command
```
python experiments/dff_rfcn/dff_rfcn_end2end_train_test.py --cfg experiments/dff_rfcn/cfgs/resnet_v1_101_flownet_imagenet_vid_rfcn_end2end_ohem.yaml
```
A cache folder would be created automatically to save the model and the log under output/dff_rfcn/imagenet_vid/.
Please find more details in config files and in our code.

Acknowledgement

The code of LSFA on is based on

Thanks for the contribution of the above repositories.

hustvl / LSFA

readme