xinghaochen / SqueezeTime

Official PyTorch implementation of "No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding"
Apache License 2.0
30 stars 1 forks source link
3d 3d-backbone mobile video

SqueezeTime

No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

Yingjie Zhai, Wenshuo Li, Yehui Tang, Xinghao Chen, Yunhe Wang

arXiv 2024

[arXiv] [BibTeX]

🔥 Updates

📸 Overview

This is an official pytorch implementation of our paper "No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding". In this paper, we propose to squeeze the time axis of a video sequence into the channel dimension and present a lightweight video recognition network, term as SqueezeTime, for mobile video understanding. To enhance the temporal modeling capability of the model, we design a Channel-Time Learning (CTL) Block to capture temporal dynamics of the sequence. This module has two complementary branches, in which one branch is for temporal importance learning and another branch with temporal position restoring capability is to enhance inter-temporal object modeling ability. The proposed SqueezeTime is much lightweight and fast with high accuracies for mobile video understanding. Extensive experiments on various benchmarks, i.e., Kinetics400, Kinetics600, SomethingSomethingV2, HMDB51, AVA2.1, and THUMOS14, demonstrate the superiority of our model. For example, our SqueezeTime achieves +1.2% accuracy and +80% GPU throughput gain on Kinetics400 than prior methods.


Figure 1: Pipeline of the SqueezeTime.

1️⃣ Requirements

The models is based on the mmaction2 project. Please run the models by mmaction2.

  1. Clone the mmaction2 project

    git clone https://github.com/open-mmlab/mmaction2.git

    Then install mmaction2 following the offical instruction.

  2. Clone this repository

    git clone https://github.com/xinghaochen/SqueezeTime.git
    cd SqueezeTime
  3. Copy our model files

    mmaction/models/backbones/SqueezeTime.py
    mmaction/models/backbones/SqueezeTime_ava.py
    mmaction/models/heads/i2d_head.py

    into the mmaction folder following the original structure. Then add the reference in the "__init__.py" of backbones and heads files of the mmaction.

  4. Copy the "ckpt", "configs", and "data" folders into the mmaction2 project following the original sturcture.

2️⃣ Data Preparation

All dataset are organized using mmaction2 format. Please organize the data directory as follows after downloading all of them:

3️⃣ Training & Testing

All config files are in 'configs/recognition/' and 'configs/detection/'. The configs for train and test K400, K600, HMDB51, sthv2, and AVA2.1 datasets are in

"configs/recognition/SqueezeTime/SqueezeTime_K400.py",
"configs/recognition/SqueezeTime/SqueezeTime_K600.py",
"configs/recognition/SqueezeTime/SqueezeTime_HMDB51.py",
"configs/detection/SqueezeTime_AVA2.1.py".

Take the Kinectics400 dataset for an example:

Besides, we also provide the configs of other models trained in our paper, please find them in the "configs" folder.

Model Checkpoints

Dataset Checkpoints
ImageNet1K Download
K400 Download
K600 Download
HMDB51 Download
AVA2.1 Download

4️⃣ Evaluation

The following results are on the Kinetics400 dataset. Please see the paper for the results on other datasets.


Figure2. Performace comparison of multiple lightweight methods on K400 dataset.


Figure3. Speed comparison of methods on K400 dataset. (a) CPU and GPU speed, (b) Mobile Phone CPU Latency.

✏️ Reference

If you find SqueezeTime useful in your research or applications, please consider giving a star ⭐ and citing using the following BibTeX:

@article{zhai2024SqueezeTime,
  title={No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding},
  author={Zhai, Yingjie and Li, Wenshuo and Tang, Yehui and Chen, Xinghao and Wang, Yunhe},
  journal={arXiv preprint arXiv:2405.08344},
  year={2024}
}