liaorongfan / DeepPersonality

Banchmark for personality traits prediction with neural networks
MIT License
43 stars 12 forks source link
deep-learning personality-recognition

An Open-source Benchmark of Deep Learning Models for Audio-visual Apparent and Self-reported Personality Recognition


This is the official code repo of An Open-source Benchmark of Deep Learning Models for Audio-visual Apparent and Self-reported Personality Recognition (

In this project, seven visual models, six audio models and five audio-visual models have been reproduced and evaluated. Besides, seven widely-used visual deep learning models, which have not been applied to video-based personality computing before, have also been employed for benchmark. Detailed description can be found in our paper.

All benchmarked models are evaluated on: the ChaLearn First Impression dataset and the ChaLearn UDIVA self-reported personality dataset

This project is currently under active development. Documentation, examples, and tutorial will be progressively detailed


Setup project: you can use either Conda or Virtualenv/pipenv to create a virtual environment to run this program.

# create and activate a virtual environment
virtualenv -p python38 venv
source venv/bin/activate

Installing from PyPI

pip install deep_personality

Installing from Github

# clone current repo
git clone DeepPersonality
cd DeepPersonality

# install required packages and dependencies
pip install -r requirements.txt


The datasets we used for benchmark are Chalearn First Impression and UDIVA.

To meet various requirements from different models and experiments, we extract raw audio file and all frames from a video and then extract face images from each full frame, termed as face frames.

For quick start and demonstration, we provide a tiny Chalearn 2016 dataset containing 100 videos within which 60 for training, 20 for validation and 20 for test. Please find the process methods in dataset preparation.

For your convenience, we provide the processed face image frames dataset for Chalearn 2016 since that dataset is publicly available, which indicates we can make our processed data open to the community.


Reproducing reported experiments

We employ a build-from-config manner to conduct an experiment. After setting up the environments and preparing the data needed, we can have a quick start by the following command line:

# cd DeepPersonality # top directory 
script/ --config path/to/exp_config.yaml 

For quick start with tiny ChaLearn 2016 dataset, if you prepare the data by the instructions in above section, the following command will launch an experiment for bimodal-resnet18 model.

# cd DeepPersonality # top directory
script/ --config config/demo/bimodal_resnet18.yaml

Detailed arguments description are presented in command line interface file.

For quick start demonstration, please find the Colab Notebook: QuickStart

For experiments start from raw video processing, please find this Colab Notebook: StartFromDataProcessing

Developing new personality computing models

We use config-pipe line files and registration mechanism to organize our experiments. If user want to add their own models or algorithms into this program please reference the Colab Notebook TrainYourModel


On ChaLearn 2016 dataset

Model Modal ChaLearn2016 cfgs ChaLearn2016 weights
DAN visual cfg weight
CAM-DAN+ visual cfg weight
ResNet visual cfg weight
HRNet visual cfg-frame/cfg-face weight-frame/weight-face
SENet visual cfg-frame/cfg-face [weight]()
3D-ResNet visual cfg-frame/cfg-face [weight]()
Slow-Fast visual cfg-frame/cfg-face [weight]()
TPN visual cfg-frame/cfg-face [weight]()
Swin-Transformer visual cfg-frame/cfg-face [weight]()
VAT visual cfg-frame/cfg-face [weight]()
Interpret Audio CNN audio cfg [weight]()
Bi-modal CNN-LSTM audiovisual cfg [weight]()
Bi-modal ResNet audiovisual cfg [weight]()
PersEmoN audiovisual cfg [weight]()
CRNet audiovisual cfg weight
Amb-Fac audiovisual cfg-frame, cfg-face [weight]()

On ChaLearn 2021 dataset

Model Modal Talk Lego Ghost Animal
DAN visual [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]()
CAM-DAN+ visual [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]()
ResNet visual [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]()
HRNet visual [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]()
SENet visual [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]()
3D-ResNet visual [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]()
Slow-Fast visual [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]()
TPN visual [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]()
Swin-Transformer visual [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]()
VAT visual [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]()
Interpret Audio CNN audio [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]()
Bi-modal CNN-LSTM audiovisual [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]()
Bi-modal ResNet audiovisual [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]()
PersEmoN audiovisual [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]()
CRNet audiovisual [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]()
Amb-Fac audiovisual [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]() [cfg]() [weight]()


From which the models are reproduced


If you use our code for a publication, please kindly cite it as:

  title={An open-source benchmark of deep learning models for audio-visual apparent and self-reported personality recognition},
  author={Liao, Rongfan and Song, Siyang and Gunes, Hatice},
  journal={IEEE Transactions on Affective Computing},


To Be Updated