headpose-fsanet-pytorch

Pytorch implementation of FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation from a Single Image².

Demo

demo

Video file or a camera index can be provided to demo script. If no argument is provided, default camera index is used.

Video File Usage

For any video format that OpenCV supported (mp4, avi etc.):

python3 demo.py --video /path/to/video.mp4

Camera Usage

python3 demo.py --cam 0

Results

Model	Dataset Type	Yaw (MAE)	Pitch (MAE)	Roll (MAE)
FSA-Caps (1x1)	1	4.85	6.27	4.96
FSA-Caps (Var)	1	5.06	6.46	5.00
FSA-Caps (1x1 + Var)	1	4.64	6.10	4.79

Note: My results are slightly worse than original author's results. For best results, please refer to official repository¹.

Dependencies

Name                      Version 
python                    3.7.6
numpy                     1.18.5
opencv                    4.2.0
scipy                     1.5.0
matplotlib-base           3.2.2
pytorch                   1.5.1
torchvision               0.6.1
onnx                      1.7.0
onnxruntime               1.2.0

Installation with pip

pip3 install -r requirements.txt

You may also need to install jupyter to access notebooks (.ipynb). It is recommended that you use Anaconda to install packages.

Code has been tested on Ubuntu 18.04

Important Files Overview

src/dataset.py: Our pytorch dataset class is defined here
src/model.py: Pytorch FSA-Net model is defined here
src/transforms.py: Augmentation Transforms are defined here
src/1-Explore Dataset.ipynb: To explore training data, refer to this notebook
src/2-Train Model.ipynb: For model training, refer to this notebook
src/3-Test Model.ipynb: For model testing, refer to this notebook
src/4-Export to Onnx.ipynb: For exporting model, refer to this notebook
src/demo.py: Demo script is defined here

Download Dataset

For model training and testing, download the preprocessed dataset from author's official git repository¹ and place them inside data/ directory. I am only using type1 data for training and testing. Your dataset hierarchy should look like:

data/
  type1/
    test/
      AFLW2000.npz
    train/
      AFW.npz
      AFW_Flip.npz
      HELEN.npz
      HELEN_Flip.npz
      IBUG.npz
      IBUG_Flip.npz
      LFPW.npz
      LFPW_Flip.npz

License

Acknowledgements

Special thanks to Mr. Tsun-Yi Yang for providing an excellent code to his paper. Please refer to the official repository to see detailed information and best results regarding the model:

[1] T. Yang, FSA-Net, (2019), GitHub repository

The models are trained and tested with various public datasets which have their own licenses. Please refer to them before using the code

References

[2] T. Yang, Y. Chen, Y. Lin and Y. Chuang, "FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation From a Single Image," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 1087-1096, doi: 10.1109/CVPR.2019.00118. IEEE-Xplore link

[3] Tal Hassner, Shai Harel, Eran Paz, and Roee Enbar. Effective face frontalization in unconstrained images. In CVPR, 2015

[4] Xiangyu Zhu, Zhen Lei, Junjie Yan, Dong Yi, and Stan Z. Li. High-fidelity pose and expression normalization for face recognition in the wild. In CVPR, 2015.

omasaht / headpose-fsanet-pytorch

readme