Recognizing emotional states in faces
Authors: Luca Mella, Daniele Bellavista
Contributors: Rohit Krishnan
Development Status: Experimental
Copyleft: CC-BY-NC 2013
This project aims to recognize main facial expressions (neutral, anger, disgust, fear, joy, sadness, surprise) in image sequences using the approaches described in:
Here is listed some interesting material about machine learning, opencv, gabor transforms and other stuff that could be useful to get in this topic:
src
\-->dataset Scripts for dataset management
\-->facecrop Utilities and modules for face cropping and registration
\-->gaborbank Utilities and modules for generating gabor filters and image filtering
\-->adaboost Utilities and modules for adaboost train, prediction, and feature selection
\-->svm Utilities and modules for svm training and prediction
\-->detector Multiclass detector and preprocessor
\-->utils String and IO utilities, CSV supports, and so on..
doc Documentation (doxigen)
report Class project report (latex)
resources Containing third party resources (eg. OpenCV haar classifiers)
assets Binary folder (I know, I know, it is not beautiful)
test Some testing scripts here
Dependencies:
CMake >= 2.8
Python >= 2.7, < 3.0
OpenCV >= 2.4.5
Compiling on linux:
mkdir build
cd build
cmake .. ; make ; make install
- now the asset
folder should be populatedCross-compiling for windows:
OpenCV_DIR
set it to the appropriate path so that:
*.lib
)opencv2/opencv
or similar.Proof of concept model trained using faces extracted using the detector cbcl1
are available for download, mulclass strategy 1 vs all and many vs many.
NOTE: Trained models for latest version of the code are available in the v1.2 release page (deprecated). Other trained model working better with master branch are available here
NOTE: watch for illumination! At the moment optimal results can be obtained in live webcam sessions using direct illumination directed to the user's face. Don't worry you are not required to blind you with a headlight.
_If you'd like to try emotime without any further complication you should take a look to the x86_64 release. ( obsolete ) _
Video gui:
echo "VIDEOPATH" | ./emotimevideo_cli FACEDETECTORXML (EYEDETECTORXML|none) WIDTH HEIGHT NWIDTHS NLAMBDAS NTHETAS (svm|ada) (TRAINEDCLASSIFIERSXML)+
Cam gui:
./emotimegui_cli FACEDETECTORXML (EYEDETECTORXML|none) WIDTH HEIGHT NWIDTHS NLAMBDAS NTHETAS (svm|ada) (TRAINEDCLASSIFIERSXML)+
Or using the python script:
python gui.py --cfg <dataset_configuration_path> --mode svm --eye-correction <dataset_path>
If you just want to take a quick look to the project we strongly suggest to go to the release section and download compiled binaries for Linux 64bit, then:
./download_trained_models.sh
./emotimegui_cli ../resources/haarcascade_frontalface_cbcl1.xml none 48 48 3 5 4 svm ../dataset_svm_354_cbcl1_1vsallext/classifiers/svm/*
After mkdir build; cd build; cmake ..; make ; make install
go to the assets
folder and:
Initialize a dataset using:
python datasetInit.py -cfg <CONFIGFILE> <EMPTY_DATASET_FOLDER>
Then fill it with your images or use the Cohn-Kanade importing script:
python datasetFillCK --cfg <CONFIGFILE> <DATASETFOLDER> <CKFOLDER> <CKEMOTIONFOLDER>
Now you are ready to train models:
python train_models.py --cfg <CONFIGFILE> --mode svm --prep-train-mode [1vsall|1vsallext] <DATASETFOLDER>
The Cohn-Kanade database is one of the most used faces database. Its extended version (CK+) contains also FACS code labels (aka Action Units) and emotion labels (neutral, anger, contempt, disgust, fear, happy, sadness, surprise).
First, rough evaluation of the performance of the system
Validation test involved the whole system face detector + emotion classifier
, so should not be considered relative to the emotion classifier itself.
Of course, a more fine validation shuld be tackled in order to evaluate emotion classifier alone. For the sake of completeness the reader have to know that the cbcl1 face model is a good face locator rather than detector, roughly speaking it detects less but is more precise.
Following results are commented with my personal - totally informal - evaluation after live webcam session.
multicalss method: 1vsAllExt
face detector: cbcl1
eye correction: no
width: 48
height: 48
nwidths: 3
nlambdas: 5
nthetas: 4
Sadness <-- Not good in live webcam sessions too
sadness -> 0.67%
surprise -> 0.17%
anger -> 0.17%
Neutral <-- Good in live webcam sessions
neutral -> 0.90%
contempt -> 0.03%
anger -> 0.03%
fear -> 0.02%
surprise -> 0.01%
Disgust <-- Good in live webcam sessions
disgust -> 1.00%
Anger <-- Good in live webcam sessions
anger -> 0.45%
neutral -> 0.36%
disgust -> 0.09%
contempt -> 0.09%
Surprise <-- Good in live webcam sessions
surprise -> 0.94%
neutral -> 0.06%
Fear <-- Almost Good in live webcam sessions
fear -> 0.67%
surprise -> 0.17%
happy -> 0.17%
Contempt <-- Not good in live webcam sessions
neutral -> 0.50%
contempt -> 0.25%
anger -> 0.25%
Happy <-- Good in live webcam sessions
happy -> 1.00%
multicalss method: 1vsAll
face detector: cbcl1
eye correction: no
width: 48
height: 48
nwidths: 3
nlambdas: 5
nthetas: 4
Sadness <-- Not good in live webcam sessions too
unknown -> 0.50%
sadness -> 0.33%
fear -> 0.17%
Neutral <-- Good in live webcam sessions
neutral -> 0.73%
unknown -> 0.24%
surprise -> 0.01%
fear -> 0.01%
contempt -> 0.01%
Disgust <-- Good in live webcam sessions
disgust -> 0.82%
unknown -> 0.18%
Anger <-- Almost sufficient in live webcam sessions
anger -> 0.36%
neutral -> 0.27%
unknown -> 0.18%
disgust -> 0.09%
contempt -> 0.09%
Surprise <-- Good in live webcam sessions
surprise -> 0.94%
neutral -> 0.06%
Fear <-- Sufficient in live webcam sessions
fear -> 0.67%
surprise -> 0.17%
happy -> 0.17%
Contempt <-- Not good in live webcam sessions too
unknown -> 1.00%
Happy <-- Good in live webcam sessions
happy -> 1.00%
Also main difference between the 1vsAll and the 1vsAllExt mode experimented in livecam sessions are related to the amount of unknown states registered and the stability of the detected states. In detail 1vsAll multiclass method provide more less noisy detections during a live web-cam session, 1vsAllExt mode instead is able to always predict a valid state for each frame processed, but sometimes it result to be more unstable during the expression transition.
Sorry for the lack of fine tuning and detail, but it is a spare time project at the moment.. If you have any idea or suggestion feel free to write us!