Pytorch implementation of FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation from a Single Image2.
Video file or a camera index can be provided to demo script. If no argument is provided, default camera index is used.
For any video format that OpenCV supported (mp4
, avi
etc.):
python3 demo.py --video /path/to/video.mp4
python3 demo.py --cam 0
Model | Dataset Type | Yaw (MAE) | Pitch (MAE) | Roll (MAE) |
---|---|---|---|---|
FSA-Caps (1x1) | 1 | 4.85 | 6.27 | 4.96 |
FSA-Caps (Var) | 1 | 5.06 | 6.46 | 5.00 |
FSA-Caps (1x1 + Var) | 1 | 4.64 | 6.10 | 4.79 |
Note: My results are slightly worse than original author's results. For best results, please refer to official repository1.
Name Version
python 3.7.6
numpy 1.18.5
opencv 4.2.0
scipy 1.5.0
matplotlib-base 3.2.2
pytorch 1.5.1
torchvision 0.6.1
onnx 1.7.0
onnxruntime 1.2.0
Installation with pip
pip3 install -r requirements.txt
You may also need to install jupyter to access notebooks (.ipynb). It is recommended that you use Anaconda to install packages.
Code has been tested on Ubuntu 18.04
For model training and testing, download the preprocessed dataset from author's official git repository1 and place them inside data/ directory. I am only using type1 data for training and testing. Your dataset hierarchy should look like:
data/
type1/
test/
AFLW2000.npz
train/
AFW.npz
AFW_Flip.npz
HELEN.npz
HELEN_Flip.npz
IBUG.npz
IBUG_Flip.npz
LFPW.npz
LFPW_Flip.npz
Copyright (c) 2020, Omar Hassan. (MIT License)
Special thanks to Mr. Tsun-Yi Yang for providing an excellent code to his paper. Please refer to the official repository to see detailed information and best results regarding the model:
[1] T. Yang, FSA-Net, (2019), GitHub repository
The models are trained and tested with various public datasets which have their own licenses. Please refer to them before using the code
[2] T. Yang, Y. Chen, Y. Lin and Y. Chuang, "FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation From a Single Image," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 1087-1096, doi: 10.1109/CVPR.2019.00118. IEEE-Xplore link
[3] Tal Hassner, Shai Harel, Eran Paz, and Roee Enbar. Effective face frontalization in unconstrained images. In CVPR, 2015
[4] Xiangyu Zhu, Zhen Lei, Junjie Yan, Dong Yi, and Stan Z. Li. High-fidelity pose and expression normalization for face recognition in the wild. In CVPR, 2015.