yjh0410 / YOWOv2

The second generation of YOWO action detector.
MIT License
187 stars 31 forks source link
action-detection one-stage-detector you-only-watch-once

YOWOv2: A Stronger yet Efficient Multi-level Detection Framework for Real-time Spatio-temporal Action Detection

English | 简体中文

非常感谢大家的star!YOWOv2是我业余时间做的一个尝试,是对上一代的YOWO的一次致敬。YOWO曾是我很喜欢的一个工作,但如今我已不再继续研究这一方向,因此,无法再回答大家的各种问题,对此,我深感抱歉,还望大家能够谅解。YOWOv2是一个完全开放的项目,不包含任何的license,因此,请尽管做任何你想做的改进或优化,无需经过我的同意。只要这个项目能对世界进步带去哪怕是微乎其微的促进和贡献,我也足够开心了~如果您觉得我们这个工作还行,不妨引用我们挂在Arxiv上的论文链接吧(在README的最下方)。

Thank you very much for everyone's star. YOWOv2 is an attempt I made in my spare time. It is a tribute to YOWO because I used to like YOWO very much. However, I am no longer deeply involved in the field of spatiotemporal motion detection, so I am not in a position to answer some of everyone's issues. For this, I am deeply sorry. YOWOv2 is a completely open spatiotemporal action detection project. I have not added any license to the project, so please feel free to do whatever you want without my consent. As long as my work can make even a small contribution to the progress of the world, I will be very happy. If you think our work is useful, you could cite our article posted on Arxiv (at the bottom of README).

Overview of YOWOv2

image

Requirements

Visualization

image image image

image image image

Dataset

UCF101-24:

You can download UCF24 from the following links:

Link: https://drive.google.com/file/d/1Dwh90pRi7uGkH5qLRjQIFiEmMJrAog5J/view?usp=sharing

Link: https://pan.baidu.com/s/11GZvbV0oAzBhNDVKXsVGKg

Password: hmu6

AVA

You can use instructions from here to prepare AVA dataset.

Experiment

Model Clip GFLOPs Params F-mAP V-mAP FPS Weight
YOWOv2-Nano 16 1.3 3.5 M 78.8 48.0 42 ckpt
YOWOv2-Tiny 16 2.9 10.9 M 80.5 51.3 50 ckpt
YOWOv2-Medium 16 12.0 52.0 M 83.1 50.7 42 ckpt
YOWOv2-Large 16 53.6 109.7 M 85.2 52.0 30 ckpt
YOWOv2-Nano 32 2.0 3.5 M 79.4 49.0 42 ckpt
YOWOv2-Tiny 32 4.5 10.9 M 83.0 51.2 50 ckpt
YOWOv2-Medium 32 12.7 52.0 M 83.7 52.5 40 ckpt
YOWOv2-Large 32 91.9 109.7 M 87.0 52.8 22 ckpt

All FLOPs are measured with a video clip with 16 or 32 frames (224×224). The FPS is measured with batch size 1 on a 3090 GPU from the model inference to the NMS operation.

Qualitative results on UCF101-24 image

Model Clip mAP FPS weight
YOWOv2-Nano 16 12.6 40 ckpt
YOWOv2-Tiny 16 14.9 49 ckpt
YOWOv2-Medium 16 18.4 41 ckpt
YOWOv2-Large 16 20.2 29 ckpt
YOWOv2-Nano 32 12.7 40 ckpt
YOWOv2-Tiny 32 15.6 49 ckpt
YOWOv2-Medium 32 18.4 40 ckpt
YOWOv2-Large 32 21.7 22 ckpt

Qualitative results on AVA image

Train YOWOv2

For example:

python train.py --cuda -d ucf24 --root path/to/dataset -v yowo_v2_nano --num_workers 4 --eval_epoch 1 --max_epoch 8 --lr_epoch 2 3 4 5 -lr 0.0001 -ldr 0.5 -bs 8 -accu 16 -K 16

or you can just run the script:

sh train_ucf.sh

or you can just run the script:

sh train_ava.sh

If you have multiple GPUs, you can launch DDP to train the YOWOv2, for example:

python train.py --cuda -dist -d ava_v2.2 --root path/to/dataset -v yowo_v2_nano --num_workers 4 --eval_epoch 1 --max_epoch 10 --lr_epoch 3 4 5 6 -lr 0.0001 -ldr 0.5 -bs 8 -accu 16 -K 16 --eval

However, I have not multiple GPUs, so I am not sure if there are any bugs, or if the given performance can be reproduced using DDP.

Test YOWOv2

python test.py --cuda -d ucf24 -v yowo_v2_nano --weight path/to/weight -size 224 --show
python test.py --cuda -d ava_v2.2 -v yowo_v2_nano --weight path/to/weight -size 224 --show

Test YOWOv2 on AVA video

For example:

python test_video_ava.py --cuda -d ava_v2.2 -v yowo_v2_nano --weight path/to/weight --video path/to/video --show

Note that you can set path/to/video to other videos in your local device, not AVA videos.

Evaluate YOWOv2

# Frame mAP
python eval.py \
        --cuda \
        -d ucf24 \
        -v yowo_v2_nano \
        -bs 16 \
        -size 224 \
        --weight path/to/weight \
        --cal_frame_mAP \
# Video mAP
python eval.py \
        --cuda \
        -d ucf24 \
        -v yowo_v2_nano \
        -bs 16 \
        -size 224 \
        --weight path/to/weight \
        --cal_video_mAP \

Run the following command to calculate frame mAP@0.5 IoU:

python eval.py \
        --cuda \
        -d ava_v2.2 \
        -v yowo_v2_nano \
        -bs 16 \
        --weight path/to/weight

Demo

# run demo
python demo.py --cuda -d ucf24 -v yowo_v2_nano -size 224 --weight path/to/weight --video path/to/video --show
                      -d ava_v2.2

Qualitative results in real scenarios image

References

If you are using our code, please consider citing our paper.

@article{yang2023yowov2,
  title={YOWOv2: A Stronger yet Efficient Multi-level Detection Framework for Real-time Spatio-temporal Action Detection},
  author={Yang, Jianhua and Kun, Dai},
  journal={arXiv preprint arXiv:2302.06848},
  year={2023}
}