[CVPR2023] TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers

This repository is an official implementation of TokenHPE.

Overview

We propose a novel critical minority relationship-aware method based on the Transformer architecture in which the facial part relationships can be learned. Specifically, we design several orientation tokens to explicitly encode the basic orientation regions. Meanwhile, a novel token guide multi-loss function is designed to guide the orientation tokens as they learn the desired regional similarities and relationships. ## Preparation ### Environments python == 3.9, torch >= 1.10.1, CUDA ==11.2 ### Datasets Follow the [6DRepnet](https://github.com/thohemp/6DRepNet) to prepare the datasets: * **300W-LP**, **AFLW2000** from [here](http://www.cbsr.ia.ac.cn/users/xiangyuzhu/projects/3DDFA/main.htm). * **BIWI** (Biwi Kinect Head Pose Database) from [here](https://icu.ee.ethz.ch/research/datsets.html). Store them in the *datasets* directory. For 300W-LP and AFLW2000 we need to create a *filenamelist*. ``` python create_filename_list.py --root_dir datasets/300W_LP ``` The BIWI datasets needs be preprocessed by a face detector to cut out the faces from the images. You can use the script provided [here](https://github.com/shamangary/FSA-Net/blob/master/data/TYY_create_db_biwi.py). For 7:3 splitting of the BIWI dataset you can use the equivalent script [here](https://github.com/shamangary/FSA-Net/blob/master/data/TYY_create_db_biwi_70_30.py). The cropped image size is set to *256*. ### Download weights Download trained weights from [gdrive](https://drive.google.com/file/d/1bqfJs4mvQd4jQELsj3utEEeS6SDzW30_/view?usp=sharing) You can choose to use the pretrained ViT-B/16 [weigthts](https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_patch16_224_in21k-e5005f0a.pth) for the feature extractor. (optional) ### Directory structure * After preparation, you will be able to see the following directory structure: ``` TokenHPE ├── datasets │ ├── 300W_LP │ ├── files.txt │ ├── ... │ ├── AFLW2000 │ ├── files.txt │ ├── ... │ ├── ... ├── weights │ ├── TokenHPEv1-ViTB-224_224-lyr3.tar ├── figs ├── create_filename_list.py ├── datasets.py ├── README.md ├── ... ``` ## Training & Evaluation Download trained weight from [gdrive](https://drive.google.com/file/d/1bqfJs4mvQd4jQELsj3utEEeS6SDzW30_/view?usp=sharing), then you can evaluate the model following: ```sh python test.py --batch_size 64 \ --dataset ALFW2000 \ --data_dir datasets/AFLW2000 \ --filename_list datasets/AFLW2000/files.txt \ --model_path ./weights/TokenHPEv1-ViTB-224_224-lyr3.tar \ --show_viz False ``` You can train the model following: ```sh python train.py --batch_size 64 \ --num_epochs 60 \ --lr 0.00001 \ --dataset Pose_300W_LP \ --data_dir datasets/300W_LP \ --filename_list datasets/300W_LP/files.txt ``` ## Inference & Visualization You can get the visualizations following: ```sh python inference.py --model_path ./weights/TokenHPEv1-ViTB-224_224-lyr3.tar \ --image_path img_path_here ``` ## Main results We provide some results on AFLW2000 with models trained on 300W_LP. These models are trained on one TITAN V GPU. | config | MAE | VMAE | training | download | |:------------------------:|:----:|:----:|:--------:|:-------------:| | TokenHPEv1-ViT/B-224*224-lyr3 | 4.81 | 6.09 | ~24hours | [gdrive](https://drive.google.com/file/d/1bqfJs4mvQd4jQELsj3utEEeS6SDzW30_/view?usp=sharing) | ## **Acknowledgement** Many thanks to the authors of [6DRepnet](https://github.com/thohemp/6DRepNet). We reuse their code for data preprocessing and evaluation which greatly reduced redundant work. ## **Citation** If you find our work useful, please cite the paper: ``` @InProceedings{Zhang_2023_CVPR, author = {Zhang, Cheng and Liu, Hai and Deng, Yongjian and Xie, Bochen and Li, Youfu}, title = {TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {8897-8906} } ```

zc2023 / TokenHPE

readme

[CVPR2023] TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers

Overview