shoeffner commented 7 years ago

In this issue I will write down some notes for the thesis, especially decisions during the implementation process.

shoeffner commented 7 years ago

dlib vs OpenCV for gaze tracking

Pro OpenCV

1. Faster: Measuring only detection of frontal face and eyes from Webcamstream, no further processing:

OpenCV Haarcascades: ~130-140 ms
dlib 5 landmarks model: ~160-170 ms

2. Easier to setup: No additional dependency (OpenCV is used for the Webcaminterface anyways).

Pro dlib

1. More robust: This is a highly subjective metric, but dlib was creating fewer jumping artifact (basically none during the tests). Also see http://blog.dlib.net/2014/02/dlib-186-released-make-your-own-object.html (especially https://youtu.be/LsK0hzcEyHI) 2. Additional interesting keypoints: While it is slightly more work to extract the bounding boxes for the eyes, the detection of corner points might come in handy later for reference points when calculating the gaze point.

Decision

For now I will use dlib for the heavy lifting but keep OpenCV for webcam. The robustness and additional features (eye corners, even with the 5 landmarks model) might help speed up the process at later stages.

shoeffner commented 7 years ago

Decision dlib 5 vs. 68 landmarks

Pro 5 landmarks model

1. License: The 68 landmarks model is trained on the 300 faces in the wild dataset, which does not permit commercial licensing. However, the MIT license permits unconditional usage, and as such I consider the licenses as being incompatible. 2. Enough information: This is not 100% clear yet, but it seems the information (eye corner points and face boundary) might be enough for the moment. This might need to be revisited for head pose estimations though. 3. Recommended:

in fact is the new recommended model to use with dlib's face recognition tooling. @davisking http://blog.dlib.net/2017/09/fast-multiclass-object-detection-in.html

Pro 68 landmarks

1. Accuracy: More landmarks might mean better accuracy, but this was not tested.

Decision

I will ~start using~ use the 5 landmarks model, as it's very compatible and performs well. ~As a stretch goal I can imagine adding the 68 landmarks model as a configuration step, to allow scientific purposes to use it.~

Revision:

Adding support for both methods for the moment, but probably switching to the 68 landmarks model, as the PnP algorithm does not work well for the 5 landmarks.

Also the computation time seems to be similar at about 40 ms for both models.

Revision 2:

I should also consider training my own landmark detector using e.g. the BioID, as this dataset would already find pupils...

~Possible TODO:~

[ ] ~Measure 5 vs 68 landmarks models: accuracy (how?), computation time (is there a difference?)~
[x] ~Configurable setting~

shoeffner commented 7 years ago

Decision dlib HOG vs CNN

Decision

For now I use the normal HOG approach.

TODO:

Evaluate CNN vs. HoG

shoeffner commented 7 years ago

OpenPose

Decision

No usage: It's license only allows non-commercial applications.

shoeffner commented 6 years ago

Included.

shoeffner / gaze

Notes for thesis #16

dlib vs OpenCV for gaze tracking

Pro OpenCV

Pro dlib

Decision

Decision dlib 5 vs. 68 landmarks

Pro 5 landmarks model

Pro 68 landmarks

Decision

Revision:

Revision 2:

~Possible TODO:~

Decision dlib HOG vs CNN

Decision

TODO:

OpenPose

Decision