luca-m / emotime

Recognizing emotional states in faces
138 stars 65 forks source link

Emotime

Recognizing emotional states in faces


Authors: Luca Mella, Daniele Bellavista

Contributors: Rohit Krishnan

Development Status: Experimental

Copyleft: CC-BY-NC 2013


Goal

This project aims to recognize main facial expressions (neutral, anger, disgust, fear, joy, sadness, surprise) in image sequences using the approaches described in:

References

Here is listed some interesting material about machine learning, opencv, gabor transforms and other stuff that could be useful to get in this topic:

Project Structure

src
  \-->dataset        Scripts for dataset management
  \-->facecrop       Utilities and modules for face cropping and registration
  \-->gaborbank      Utilities and modules for generating gabor filters and image filtering
  \-->adaboost       Utilities and modules for adaboost train, prediction, and feature selection
  \-->svm          Utilities and modules for svm training and prediction
  \-->detector     Multiclass detector and preprocessor
  \-->utils        String and IO utilities, CSV supports, and so on..
doc                Documentation (doxigen)
report             Class project report (latex)
resources          Containing third party resources (eg. OpenCV haar classifiers)
assets             Binary folder (I know, I know, it is not beautiful)
test               Some testing scripts here

Build

Dependencies:

Compiling on linux:

Cross-compiling for windows:

Detection and Prediction

Proof of concept model trained using faces extracted using the detector cbcl1 are available for download, mulclass strategy 1 vs all and many vs many.

NOTE: Trained models for latest version of the code are available in the v1.2 release page (deprecated). Other trained model working better with master branch are available here

NOTE: watch for illumination! At the moment optimal results can be obtained in live webcam sessions using direct illumination directed to the user's face. Don't worry you are not required to blind you with a headlight.

_If you'd like to try emotime without any further complication you should take a look to the x86_64 release. ( obsolete ) _

Usage

Video gui:

echo "VIDEOPATH" | ./emotimevideo_cli FACEDETECTORXML (EYEDETECTORXML|none) WIDTH HEIGHT NWIDTHS NLAMBDAS NTHETAS (svm|ada) (TRAINEDCLASSIFIERSXML)+

Cam gui:

./emotimegui_cli FACEDETECTORXML (EYEDETECTORXML|none) WIDTH HEIGHT NWIDTHS NLAMBDAS NTHETAS (svm|ada) (TRAINEDCLASSIFIERSXML)+

Or using the python script:

python gui.py --cfg <dataset_configuration_path> --mode svm --eye-correction <dataset_path>

Binary Release and Trained Models

If you just want to take a quick look to the project we strongly suggest to go to the release section and download compiled binaries for Linux 64bit, then:

Training

After mkdir build; cd build; cmake ..; make ; make install go to the assets folder and:

  1. Initialize a dataset using:

    python datasetInit.py -cfg <CONFIGFILE> <EMPTY_DATASET_FOLDER>
  2. Then fill it with your images or use the Cohn-Kanade importing script:

    python datasetFillCK --cfg <CONFIGFILE> <DATASETFOLDER> <CKFOLDER> <CKEMOTIONFOLDER>
  3. Now you are ready to train models:

    python train_models.py --cfg <CONFIGFILE> --mode svm --prep-train-mode [1vsall|1vsallext] <DATASETFOLDER>

Dataset

The Cohn-Kanade database is one of the most used faces database. Its extended version (CK+) contains also FACS code labels (aka Action Units) and emotion labels (neutral, anger, contempt, disgust, fear, happy, sadness, surprise).

Validation (old, check v1.2 release page)

First, rough evaluation of the performance of the system Validation test involved the whole system face detector + emotion classifier, so should not be considered relative to the emotion classifier itself.

Of course, a more fine validation shuld be tackled in order to evaluate emotion classifier alone. For the sake of completeness the reader have to know that the cbcl1 face model is a good face locator rather than detector, roughly speaking it detects less but is more precise.

Following results are commented with my personal - totally informal - evaluation after live webcam session.

multicalss method: 1vsAllExt 
face detector:     cbcl1
eye correction:    no 
width:             48
height:            48 
nwidths:           3 
nlambdas:          5
nthetas:           4

Sadness                   <-- Not good in live webcam sessions too
  sadness -> 0.67%
  surprise -> 0.17%
  anger -> 0.17%
Neutral                   <-- Good in live webcam sessions
  neutral -> 0.90%
  contempt -> 0.03%
  anger -> 0.03%
  fear -> 0.02%
  surprise -> 0.01%
Disgust                   <-- Good in live webcam sessions
  disgust -> 1.00%
Anger                     <-- Good in live webcam sessions
  anger -> 0.45%
  neutral -> 0.36%
  disgust -> 0.09%
  contempt -> 0.09%
Surprise                  <-- Good in live webcam sessions
  surprise -> 0.94%
  neutral -> 0.06%
Fear                      <-- Almost Good in live webcam sessions
  fear -> 0.67%
  surprise -> 0.17%
  happy -> 0.17%
Contempt                  <-- Not good in live webcam sessions
  neutral -> 0.50%
  contempt -> 0.25%
  anger -> 0.25%
Happy                     <-- Good in live webcam sessions
  happy -> 1.00%
multicalss method: 1vsAll 
face detector:     cbcl1
eye correction:    no 
width:             48
height:            48 
nwidths:           3 
nlambdas:          5
nthetas:           4

Sadness                   <-- Not good in live webcam sessions too
  unknown -> 0.50%
  sadness -> 0.33%
  fear -> 0.17%
Neutral                   <-- Good in live webcam sessions 
  neutral -> 0.73%
  unknown -> 0.24%
  surprise -> 0.01%
  fear -> 0.01%
  contempt -> 0.01%
Disgust                   <-- Good in live webcam sessions
  disgust -> 0.82%
  unknown -> 0.18%
Anger                     <-- Almost sufficient in live webcam sessions
  anger -> 0.36%
  neutral -> 0.27%
  unknown -> 0.18%
  disgust -> 0.09%
  contempt -> 0.09%
Surprise                  <-- Good in live webcam sessions
  surprise -> 0.94%
  neutral -> 0.06%
Fear                      <-- Sufficient in live webcam sessions
  fear -> 0.67%
  surprise -> 0.17%
  happy -> 0.17%
Contempt                  <-- Not good in live webcam sessions too
  unknown -> 1.00%
Happy                     <-- Good in live webcam sessions 
  happy -> 1.00%

Also main difference between the 1vsAll and the 1vsAllExt mode experimented in livecam sessions are related to the amount of unknown states registered and the stability of the detected states. In detail 1vsAll multiclass method provide more less noisy detections during a live web-cam session, 1vsAllExt mode instead is able to always predict a valid state for each frame processed, but sometimes it result to be more unstable during the expression transition.

Sorry for the lack of fine tuning and detail, but it is a spare time project at the moment.. If you have any idea or suggestion feel free to write us!

Further Development