Closed shoeffner closed 6 years ago
1. Faster: Measuring only detection of frontal face and eyes from Webcamstream, no further processing:
2. Easier to setup: No additional dependency (OpenCV is used for the Webcaminterface anyways).
1. More robust: This is a highly subjective metric, but dlib was creating fewer jumping artifact (basically none during the tests). Also see http://blog.dlib.net/2014/02/dlib-186-released-make-your-own-object.html (especially https://youtu.be/LsK0hzcEyHI) 2. Additional interesting keypoints: While it is slightly more work to extract the bounding boxes for the eyes, the detection of corner points might come in handy later for reference points when calculating the gaze point.
For now I will use dlib for the heavy lifting but keep OpenCV for webcam. The robustness and additional features (eye corners, even with the 5 landmarks model) might help speed up the process at later stages.
1. License: The 68 landmarks model is trained on the 300 faces in the wild dataset, which does not permit commercial licensing. However, the MIT license permits unconditional usage, and as such I consider the licenses as being incompatible. 2. Enough information: This is not 100% clear yet, but it seems the information (eye corner points and face boundary) might be enough for the moment. This might need to be revisited for head pose estimations though. 3. Recommended:
in fact is the new recommended model to use with dlib's face recognition tooling. @davisking http://blog.dlib.net/2017/09/fast-multiclass-object-detection-in.html
1. Accuracy: More landmarks might mean better accuracy, but this was not tested.
I will ~start using~ use the 5 landmarks model, as it's very compatible and performs well. ~As a stretch goal I can imagine adding the 68 landmarks model as a configuration step, to allow scientific purposes to use it.~
Adding support for both methods for the moment, but probably switching to the 68 landmarks model, as the PnP algorithm does not work well for the 5 landmarks.
Also the computation time seems to be similar at about 40 ms for both models.
I should also consider training my own landmark detector using e.g. the BioID, as this dataset would already find pupils...
For now I use the normal HOG approach.
Evaluate CNN vs. HoG
Included.
In this issue I will write down some notes for the thesis, especially decisions during the implementation process.