hustvl / LSFA

Real-Time and Accurate Object Detection in Compressed Video by Long Short-term Feature Aggregation
Apache License 2.0
19 stars 3 forks source link

Real-Time and Accurate Object Detection in Compressed Video by Long Short-term Feature Aggregation

Introduction

Real-Time and Accurate Object Detection in Compressed Video by Long Short-term Feature Aggregation provides a simple, fast, accurate, and end-to-end framework for video recognition (e.g., object detection and semantic segmentation in videos). It is worth noting that:

framework

Main Results

framework

For fair comparisons, all the values of testing speed are obtained using a Titan X GPU

Requirements: Software

  1. MXNet from the offical repository. We tested our code on MXNet@(commit 75a9e187d).

  2. Python 2.7. We recommend using Anaconda2 as it already includes many common packages. We do not suppoort Python 3 yet, if you want to use Python 3 you need to modify the code to make it work.

  3. Python packages might missing: cython, opencv-python >= 3.2.0, easydict. If pip is set up on your system, those packages should be able to be fetched and installed by running

    pip install Cython
    pip install opencv-python==3.2.0.6
    pip install easydict==1.6
  4. We use ffmpeg 3.1.3 to generate mpeg4 raw videos.

  5. We build coviar.so to load compressed representation (I-frame, motion vectors, or residual).

Requirements: Hardware

Any NVIDIA GPUs with at least 8GB memory should be OK

Installation

  1. Clone the LSFA repository, and we'll call the directory that you cloned LSFA as ${LSFA_ROOT}.

    git clone https://github.com/hustvl/LSFA.git
  2. For Linux user, run sh ./init.sh. The scripts will build cython module automatically and create some folders.

  3. Install MXNet:

    3.1 Clone MXNet and checkout to MXNet@(commit 75a9e187d) by

    git clone --recursive https://github.com/apache/incubator-mxnet
    git checkout 75a9e187d
    git submodule update

    3.2 Copy operators in $(DFF_ROOT)/dff_rfcn/operator_cxx or $(DFF_ROOT)/rfcn/operator_cxx to $(YOUR_MXNET_FOLDER)/src/operator/contrib by

    cp -r $(LSFA_ROOT)/dff_rfcn/operator_cxx/* $(MXNET_ROOT)/src/operator/contrib/

    3.3 Compile MXNet

    cd ${MXNET_ROOT}
    make -j4

    3.4 Install the MXNet Python binding by

    Note: If you will actively switch between different versions of MXNet, please follow 3.5 instead of 3.4

    cd python
    sudo python setup.py install

    3.5 For advanced users, you may put your Python packge into ./external/mxnet/$(YOUR_MXNET_PACKAGE), and modify MXNET_VERSION in ./experiments/dff_rfcn/cfgs/*.yaml to $(YOUR_MXNET_PACKAGE). Thus you can switch among different versions of MXNet quickly.

  4. Install ffmpeg:

    4.1 Clone ffmpeg and checkout to ffmpeg@(commit 74c6a6d3735f79671b177a0e0c6f2db696c2a6d2) by

    git clone https://github.com/FFmpeg/FFmpeg.git
    git checkout 74c6a6d3735f79671b177a0e0c6f2db696c2a6d2

    4.2 Compile ffmpeg

    make clean
    ./configure --prefix=${FFMPEG_INSTALL_PATH} --enable-pic --disable-yasm --enable-shared
    make
    make install

    4.3 If needed, add ${FFMPEG_INSTALL_PATH}/lib/ to $LD_LIBRARY_PATH.

  5. Build coviar_py2.so

    cd $(LSFA_ROOT)/external/data_loader_py2
    sh install.sh
    cp ./build/lib.linux-x86_64-2.7/coviar_py2.so $(LSFA_ROOT)/lib

Preparation for Training & Testing

  1. Please download ILSVRC2015 DET and ILSVRC2015 VID dataset, and make sure it looks like this:

    ./data/ILSVRC2015/
    ./data/ILSVRC2015/Annotations/DET
    ./data/ILSVRC2015/Annotations/VID
    ./data/ILSVRC2015/Data/DET
    ./data/ILSVRC2015/Data/VID
    ./data/ILSVRC2015/ImageSets
  2. Use ffmpeg generate mpeg4 raw videos.

    sh ./data/reencode_vid ./data/ILSVRC2015/Data/VID/snippets ./data/ILSVRC2015/Data/VID/mpeg4_snippets 
  3. For your convenience, we provide the trained models and pretrained_model from Baidu Yun (pwd:493a), and put pretrained_model under folder ./model. put the trained model under folder ./output:

Usage

  1. All of our experiment settings (GPU #, dataset, etc.) are kept in yaml config files at folder ./experiments/{rfcn/dff_rfcn}/cfgs.

  2. Two config files have been provided so far, namely, Frame baseline with R-FCN and LSFA with R-FCN for ImageNet VID. We use 4 GPUs to train models on ImageNet VID.

  3. To perform experiments, run the python script with the corresponding config file as input. For example, to train and test LSFA with R-FCN, use the following command

    python experiments/dff_rfcn/dff_rfcn_end2end_train_test.py --cfg experiments/dff_rfcn/cfgs/resnet_v1_101_flownet_imagenet_vid_rfcn_end2end_ohem.yaml

    A cache folder would be created automatically to save the model and the log under output/dff_rfcn/imagenet_vid/.

  4. Please find more details in config files and in our code.

Acknowledgement

The code of LSFA on is based on

Thanks for the contribution of the above repositories.