Closed Wenzhao-Xiang closed 5 years ago
There are some potential references:
Nice references! Thanks!
For face detection, there are three popular models now:
For worrying of performance on web, first I will have a try with Tiny-YOLO-v2 for face detection.
I implemented tiny-yolo-v2 for face detection Reference: YoloKerasFaceDetection
This model is migrated from Tiny-YOLO-V2 object detection model.
The output of face detection model is reduced to [13,13,30], because there is only one class face
in label list.(The original model's output is [13,13,125] with 20 classes)
Tiny-YOLO-V2 uses new operation LeakyRelu . But when convert keras model to tflite model, LeakyRelu is replaced by MUL+Maximum, like this:
So new op we need is Maximum
.
A simple image and video demo with WebGL backend. input size [416, 416, 3] => output size [13, 13, 30] GPU: 1080TI
Single face:
Multiply face:
Tiny-yolo-V2 is fast but with low accuracy when multiply face. I will try other models later.
Visit https://wenzhao-xiang.github.io/intel/webml-examples/index.html for face detection demo.
A face detection video demo, about 29~30 FPS with 1080Ti
I implemented a CNN modelo for face alignment based on face detection(tiny_yolo_v2) above. Reference: cnn-facial-landmark
This model is just a simple CNN model with the following ops:
The output contains 68 key points like this:
This model strongly depends on the accuracy of face detection. Seems tiny-yolo-v2 can't meet its needs. Now the video demo is more OK, but image demo usually fails when comes to irregular image inputs. Two solution : (1) Look for better face detection model (2) Look for better face alignment model
A simple image and video demo with WebGL backend. input size [128, 128, 3] => output size [1, 128] (68 key points) GPU: 1080TI
Success samples:
Image demo:
Video demo:(about 22 FPS)
Fail samples:
The samples suggest that this model only works well with boxes closed to the square.
I'll be going on this example.
I investigated MTCNN last week, but it seems that MTCNN isn't suitable for our API. MTCNN models have three stages.
The first stage need different scaled input images, which means an uncertain input tensor for the first network. However, our API need all of the tensors to be certain when initializing, conflicting with MTCNN.
conflicting with MTCNN.
Thanks for the investigation. I am wondering how MTCNN implements by using DL frameworks. Do you have any ideas?
@huningxin Other DL frameworks like keras support a dynamic input. For example,
None
can be arbitrary value.
But our API allocate all the tensor when constructing graph, right?
After use SSD model for face detector, landmark detection result work better now! The face detection models are SSD_MobilenetV1, SSD_MobilenetV2, SSDLite_MobilenetV2 and Tiny_YoloV2. And face alignment model is DAN(Deep Alignment Network, here is the paper).
I have pushed my code to my github io, please go to https://wenzhao-xiang.github.io/intel/webml-examples/index.html for a simple face detection and alignment image/camera demo.
Single face:
Multiple face:
I did some investigation for face landmark detection. It seems not so good to implement an end-to-end model, since it's difficult for one model to detect face feature from a whole image.
So, plan to separate this task to two steps with two models: