haonanwang0522 / GTPT

[ECCV 2024] GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation
16 stars 0 forks source link

GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation arxiv

GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation
Accepted by ECCV 2024
Haonan Wang, Jie Liu, Jie Tang, Gangshan Wu, Bo Xu, Yanbing Chou, Yong Wang

News!

Introduction

This is the official implementation of GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation. We propose the Group-based Token Pruning Transformer (GTPT) that fully harnesses the advantages of the Transformer. GTPT alleviates the computational burden by gradually introducing keypoints in a coarse-to-fine manner. It minimizes the computation overhead while ensuring high performance. Besides, GTPT groups keypoint tokens and prunes visual tokens to improve model performance while reducing redundancy. We propose the Multi-Head Group Attention (MHGA) between different groups to achieve global interaction with little computational overhead. We conducted experiments on COCO and COCO-WholeBody. Compared to other methods, the experimental results show that GTPT can achieve higher performance with less computation, especially in whole-body with numerous keypoints.

image

Experiments

Results on COCO validation set

Method GFLOPs Params (M) AP AR
CNN-based Methods
SimBa.-Res50 8.9 34 70.4 76.3
SimBa.-Res101 12.4 53 71.4 77.1
SimBa.-Res152 15.7 68.6 72.0 77.8
HRNet-W32 7.1 28.5 74.4 79.8
HRNet-W48 14.6 63.6 75.1 80.4
Lite-HRNet-18 0.2 1.1 64.8 71.2
Lite-HRNet-30 0.3 1.8 67.2 73.3
EfficientPose-B 1.1 3.3 71.1 -
EfficientPose-C 1.6 5.0 71.3 -
Transformer-based Methods
TransPose-R-A4 8.9 6.0 72.6 78.0
TransPose-H-S 10.2 8.0 74.2 79.5
TokenPose-S-v1 2.4 6.6 72.5 78.0
TokenPose-B 6.0 13.5 74.7 80.0
DistilPose-S 2.4 5.4 71.6 -
DistilPose-L 10.3 21.3 74.4 -
PPT-S 2.0 6.6 72.2 77.8
PPT-B 5.6 13.5 74.4 79.6
GTPT-T 0.7 2.4 71.1 76.6
GTPT-S 1.6 5.4 73.6 78.9
GTPT-B 3.6 8.3 74.9 80.0

Note:

Results on COCO-WholeBody validation set

Method Input Size GFLOPs Whole-body Body Foot Face Hand
AP AR AP AR AP AR AP AR AP AR
SN N/A 272.3 32.7 45.6 42.7 58.3 9.9 36.9 64.9 69.7 40.8 58.0
OpenPose N/A 451.1 44.2 52.3 56.3 61.2 53.2 64.5 76.5 84.0 38.6 43.3
PAF 512x512 329.1 29.5 40.5 38.1 52.6 5.3 27.8 65.6 70.1 35.9 52.8
AE 512x512 212.4 44.0 54.5 58.0 66.1 57.7 72.5 58.8 65.4 48.1 57.4
DeepPose 384x288 17.3 33.5 48.4 44.4 56.8 36.8 53.7 49.3 66.3 23.5 41.0
SimBa. 384x288 20.4 57.3 67.1 66.6 74.7 63.5 76.3 73.2 81.2 53.7 64.7
HRNet 384x288 16.0 58.6 67.4 70.1 77.3 58.6 69.2 72.7 78.3 51.6 60.4
PVT 384x288 19.7 58.9 68.9 67.3 76.1 66.0 79.4 74.5 82.2 54.5 65.4
FastPose50-dcn-si 256x192 6.1 59.2 66.5 70.6 75.6 70.2 77.5 77.5 82.5 45.7 53.9
ZoomNet 384x288 28.5 63.0 74.2 74.5 81.0 60.9 70.8 88.0 92.4 57.9 73.4
ZoomNAS 384x288 18.0 65.4 74.4 74.0 80.7 61.7 71.8 88.9 93.0 62.5 74.0
ViTPose+-S 256x192 5.4 54.4 - 71.6 - 72.1 - 55.9 - 45.3 -
ViTPose+-H 256x192 122.9 61.2 - 75.9 - 77.9 - 63.3 - 54.7 -
RTMPose-m 256x192 2.2 58.2 67.4 67.3 75.0 61.5 75.2 81.3 87.1 47.5 58.9
RTMPose-l 256x192 4.5 61.1 70.0 69.5 76.9 65.8 78.5 83.3 88.7 51.9 62.8
GTPT-T 256x192 0.8 54.9 65.6 67.6 75.9 64.9 77.4 75.4 84.1 38.3 49.9
GTPT-S 256x192 2.0 59.6 69.9 71.0 78.7 70.4 82.2 81.0 87.6 45.4 57.0
GTPT-B 256x192 4.0 61.7 71.4 72.0 79.5 73.0 84.0 84.2 89.6 47.9 59.3

Note:

Start to use

1. Dependencies installation & data preparation

Please refer to THIS to prepare the environment step by step.

2. Model Zoo

Pretrained models are provided in our model zoo.

3. Trainging

CUDA_VISIBLE_DEVICES=<GPUs> python tools/train.py --cfg <Config PATH>

4. Testing

To test the pretrained models performance, please run

CUDA_VISIBLE_DEVICES=<GPUs> python tools/test.py --cfg <Config PATH>

Acknowledgement

We acknowledge the excellent implementation from SimCC, TokenPose , HRNet and HRFormer.

Citations

If you use our code or models in your research, please cite with: @article{wang2024gtpt, title={GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation}, author={Wang, Haonan and Liu, Jie and Tang, Jie and Wu, Gangshan and Xu, Bo and Chou, Yanbing and Yong, Wang}, journal={arXiv preprint arXiv:2407.10756}, year={2024} }