sallymmx / ActionCLIP

This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"
MIT License
523 stars 61 forks source link

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv]

Updates

Overview

ActionCLIP

Content

Prerequisites

The code is built with following libraries:

For video data pre-processing, you may need ffmpeg.

More detail information about libraries see INSTALL.md.

Data Preparation

We need to first extract videos into frames for fast reading. Please refer to TSN repo for the detailed guide of data pre-processing. We have successfully trained on Kinetics, UCF101, HMDB51, Charades.

Updates

Pretrained Models

Training video models is computationally expensive. Here we provide some of the pretrained models. We provide a large set of trained models in the ActionCLIP MODEL_ZOO.md.

Kinetics-400

We experiment ActionCLIP with different backbones(we choose Transf as our final visual prompt since it obtains the best results) and input frames configurations on k400. Here is a list of pre-trained models that we provide (see Table 6 of the paper). *Note that we show the 8-frame ViT-B/32 training log file in ViT32_8F_K400.log.

model n-frame top1 Acc(single-crop) top5 Acc(single-crop) checkpoint
ViT-B/32 8 78.36% 94.25% link pwd:b5ni
ViT-B/16 8 81.09% 95.49% link pwd:hqtv
ViT-B/16 16 81.68% 95.87% link pwd:dk4r
ViT-B/16 32 82.32% 96.20% link pwd:35uu

HMDB51 && UCF101

On HMDB51 and UCF101 datasets, the accuracy(k400 pretrained) is reported under the accurate setting.

HMDB51

model n-frame top1 Acc(single-crop) checkpoint
ViT-B/16 32 76.2% [link]()

UCF101

model n-frame top1 Acc(single-crop) checkpoint
ViT-B/16 32 97.1% [link]()

Testing

To test the downloaded pretrained models on Kinetics or HMDB51 or UCF101, you can run scripts/run_test.sh. For example:

# test
bash scripts/run_test.sh  ./configs/k400/k400_test.yaml

Zero-shot

We provide several examples to do zero-shot validation on kinetics-400, UCF101 and HMDB51.


## Training
We provided several examples to train ActionCLIP  with this repo:
- To train on Kinetics from CLIP pretrained models, you can run:

train

bash scripts/run_train.sh ./configs/k400/k400_train.yaml

- To train on HMDB51 from Kinetics400 pretrained models, you can run:

train

bash scripts/run_train.sh ./configs/hmdb51/hmdb_train.yaml

- To train on UCF101 from Kinetics400 pretrained models, you can run:

train

bash scripts/run_train.sh ./configs/ucf101/ucf_train.yaml


More training details, you can find in  [configs/README.md](configs/README.md)

## Contributors
ActionCLIP is written and maintained by [Mengmeng Wang](https://sallymmx.github.io/) and [Jiazheng Xing](https://april.zju.edu.cn/team/jiazheng-xing/).

## Citing ActionCLIP
If you find ActionClip useful in your research, please cite our paper.

# Acknowledgments
Our code is based on [CLIP](https://github.com/openai/CLIP) and [STM](https://openaccess.thecvf.com/content_ICCV_2019/papers/Jiang_STM_SpatioTemporal_and_Motion_Encoding_for_Action_Recognition_ICCV_2019_paper.pdf).