GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation
Accepted by ECCV 2024
Haonan Wang, Jie Liu, Jie Tang, Gangshan Wu, Bo Xu, Yanbing Chou, Yong Wang
This is the official implementation of GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation. We propose the Group-based Token Pruning Transformer (GTPT) that fully harnesses the advantages of the Transformer. GTPT alleviates the computational burden by gradually introducing keypoints in a coarse-to-fine manner. It minimizes the computation overhead while ensuring high performance. Besides, GTPT groups keypoint tokens and prunes visual tokens to improve model performance while reducing redundancy. We propose the Multi-Head Group Attention (MHGA) between different groups to achieve global interaction with little computational overhead. We conducted experiments on COCO and COCO-WholeBody. Compared to other methods, the experimental results show that GTPT can achieve higher performance with less computation, especially in whole-body with numerous keypoints.
Method | GFLOPs | Params (M) | AP | AR |
---|---|---|---|---|
CNN-based Methods | ||||
SimBa.-Res50 | 8.9 | 34 | 70.4 | 76.3 |
SimBa.-Res101 | 12.4 | 53 | 71.4 | 77.1 |
SimBa.-Res152 | 15.7 | 68.6 | 72.0 | 77.8 |
HRNet-W32 | 7.1 | 28.5 | 74.4 | 79.8 |
HRNet-W48 | 14.6 | 63.6 | 75.1 | 80.4 |
Lite-HRNet-18 | 0.2 | 1.1 | 64.8 | 71.2 |
Lite-HRNet-30 | 0.3 | 1.8 | 67.2 | 73.3 |
EfficientPose-B | 1.1 | 3.3 | 71.1 | - |
EfficientPose-C | 1.6 | 5.0 | 71.3 | - |
Transformer-based Methods | ||||
TransPose-R-A4 | 8.9 | 6.0 | 72.6 | 78.0 |
TransPose-H-S | 10.2 | 8.0 | 74.2 | 79.5 |
TokenPose-S-v1 | 2.4 | 6.6 | 72.5 | 78.0 |
TokenPose-B | 6.0 | 13.5 | 74.7 | 80.0 |
DistilPose-S | 2.4 | 5.4 | 71.6 | - |
DistilPose-L | 10.3 | 21.3 | 74.4 | - |
PPT-S | 2.0 | 6.6 | 72.2 | 77.8 |
PPT-B | 5.6 | 13.5 | 74.4 | 79.6 |
GTPT-T | 0.7 | 2.4 | 71.1 | 76.6 |
GTPT-S | 1.6 | 5.4 | 73.6 | 78.9 |
GTPT-B | 3.6 | 8.3 | 74.9 | 80.0 |
Method | Input Size | GFLOPs | Whole-body | Body | Foot | Face | Hand | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
AP | AR | AP | AR | AP | AR | AP | AR | AP | AR | |||
SN | N/A | 272.3 | 32.7 | 45.6 | 42.7 | 58.3 | 9.9 | 36.9 | 64.9 | 69.7 | 40.8 | 58.0 |
OpenPose | N/A | 451.1 | 44.2 | 52.3 | 56.3 | 61.2 | 53.2 | 64.5 | 76.5 | 84.0 | 38.6 | 43.3 |
PAF | 512x512 | 329.1 | 29.5 | 40.5 | 38.1 | 52.6 | 5.3 | 27.8 | 65.6 | 70.1 | 35.9 | 52.8 |
AE | 512x512 | 212.4 | 44.0 | 54.5 | 58.0 | 66.1 | 57.7 | 72.5 | 58.8 | 65.4 | 48.1 | 57.4 |
DeepPose | 384x288 | 17.3 | 33.5 | 48.4 | 44.4 | 56.8 | 36.8 | 53.7 | 49.3 | 66.3 | 23.5 | 41.0 |
SimBa. | 384x288 | 20.4 | 57.3 | 67.1 | 66.6 | 74.7 | 63.5 | 76.3 | 73.2 | 81.2 | 53.7 | 64.7 |
HRNet | 384x288 | 16.0 | 58.6 | 67.4 | 70.1 | 77.3 | 58.6 | 69.2 | 72.7 | 78.3 | 51.6 | 60.4 |
PVT | 384x288 | 19.7 | 58.9 | 68.9 | 67.3 | 76.1 | 66.0 | 79.4 | 74.5 | 82.2 | 54.5 | 65.4 |
FastPose50-dcn-si | 256x192 | 6.1 | 59.2 | 66.5 | 70.6 | 75.6 | 70.2 | 77.5 | 77.5 | 82.5 | 45.7 | 53.9 |
ZoomNet | 384x288 | 28.5 | 63.0 | 74.2 | 74.5 | 81.0 | 60.9 | 70.8 | 88.0 | 92.4 | 57.9 | 73.4 |
ZoomNAS | 384x288 | 18.0 | 65.4 | 74.4 | 74.0 | 80.7 | 61.7 | 71.8 | 88.9 | 93.0 | 62.5 | 74.0 |
ViTPose+-S | 256x192 | 5.4 | 54.4 | - | 71.6 | - | 72.1 | - | 55.9 | - | 45.3 | - |
ViTPose+-H | 256x192 | 122.9 | 61.2 | - | 75.9 | - | 77.9 | - | 63.3 | - | 54.7 | - |
RTMPose-m | 256x192 | 2.2 | 58.2 | 67.4 | 67.3 | 75.0 | 61.5 | 75.2 | 81.3 | 87.1 | 47.5 | 58.9 |
RTMPose-l | 256x192 | 4.5 | 61.1 | 70.0 | 69.5 | 76.9 | 65.8 | 78.5 | 83.3 | 88.7 | 51.9 | 62.8 |
GTPT-T | 256x192 | 0.8 | 54.9 | 65.6 | 67.6 | 75.9 | 64.9 | 77.4 | 75.4 | 84.1 | 38.3 | 49.9 |
GTPT-S | 256x192 | 2.0 | 59.6 | 69.9 | 71.0 | 78.7 | 70.4 | 82.2 | 81.0 | 87.6 | 45.4 | 57.0 |
GTPT-B | 256x192 | 4.0 | 61.7 | 71.4 | 72.0 | 79.5 | 73.0 | 84.0 | 84.2 | 89.6 | 47.9 | 59.3 |
Please refer to THIS to prepare the environment step by step.
Pretrained models are provided in our model zoo.
CUDA_VISIBLE_DEVICES=<GPUs> python tools/train.py --cfg <Config PATH>
To test the pretrained models performance, please run
CUDA_VISIBLE_DEVICES=<GPUs> python tools/test.py --cfg <Config PATH>
We acknowledge the excellent implementation from SimCC, TokenPose , HRNet and HRFormer.
If you use our code or models in your research, please cite with: @article{wang2024gtpt, title={GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation}, author={Wang, Haonan and Liu, Jie and Tang, Jie and Wu, Gangshan and Xu, Bo and Chou, Yanbing and Yong, Wang}, journal={arXiv preprint arXiv:2407.10756}, year={2024} }