[Example] Face landmark detection

intel / webml-polyfill

Deprecated, the Web Neural Network Polyfill project has been moved to https://github.com/webmachinelearning/webnn-polyfill

Apache License 2.0

161 stars 42 forks source link

[Example] Face landmark detection #373

Closed Wenzhao-Xiang closed 5 years ago

Wenzhao-Xiang commented 5 years ago

I did some investigation for face landmark detection. It seems not so good to implement an end-to-end model, since it's difficult for one model to detect face feature from a whole image.

So, plan to separate this task to two steps with two models:

Face detection for input image
For each face, detect its key points

huningxin commented 5 years ago

There are some potential references:

DeepFace
FaceNet
Face detection and landmarks detection models of OpenVINO model zoo
Face detection and recognition models of ONNX model zoo
OpenFace used by OpenCV.js dnn example

Wenzhao-Xiang commented 5 years ago

Nice references! Thanks!

Wenzhao-Xiang commented 5 years ago

For face detection, there are three popular models now:

SSD
YOLO
MTCNN The first two models are migrated from object detection with only one class "face". And the third model is used for face detection and alignment, which can output both detected face and five key points.

For worrying of performance on web, first I will have a try with Tiny-YOLO-v2 for face detection.

Wenzhao-Xiang commented 5 years ago

I implemented tiny-yolo-v2 for face detection Reference: YoloKerasFaceDetection
This model is migrated from Tiny-YOLO-V2 object detection model. The output of face detection model is reduced to [13,13,30], because there is only one class face in label list.(The original model's output is [13,13,125] with 20 classes)
Tiny-YOLO-V2 uses new operation LeakyRelu . But when convert keras model to tflite model, LeakyRelu is replaced by MUL+Maximum, like this: So new op we need is Maximum.
A simple image and video demo with WebGL backend. input size [416, 416, 3] => output size [13, 13, 30] GPU: 1080TI

Single face:

Multiply face:

Tiny-yolo-V2 is fast but with low accuracy when multiply face. I will try other models later.

Wenzhao-Xiang commented 5 years ago

Visit https://wenzhao-xiang.github.io/intel/webml-examples/index.html for face detection demo.

Wenzhao-Xiang commented 5 years ago

A face detection video demo, about 29~30 FPS with 1080Ti

Wenzhao-Xiang commented 5 years ago

I implemented a CNN modelo for face alignment based on face detection(tiny_yolo_v2) above. Reference: cnn-facial-landmark
This model is just a simple CNN model with the following ops:

The output contains 68 key points like this:
This model strongly depends on the accuracy of face detection. Seems tiny-yolo-v2 can't meet its needs. Now the video demo is more OK, but image demo usually fails when comes to irregular image inputs. Two solution : (1) Look for better face detection model (2) Look for better face alignment model
A simple image and video demo with WebGL backend. input size [128, 128, 3] => output size [1, 128] (68 key points) GPU: 1080TI

Success samples:

Image demo:

face_alignment_image

Video demo:(about 22 FPS)

face_alignment

Fail samples:

The samples suggest that this model only works well with boxes closed to the square.

I'll be going on this example.

Wenzhao-Xiang commented 5 years ago

I investigated MTCNN last week, but it seems that MTCNN isn't suitable for our API. MTCNN models have three stages.

Stage 1: Proposal Net, provides a large number of candidate windows, computes the bounding box regression vector and reduces the window with NMS.
Stage 2: Refining Net, rejects some non-face windows, calculates bounding box regression vectors and reduces the window with NMS.
Stage 3: Outputting Net, output 5 landmark points on the face.

The first stage need different scaled input images, which means an uncertain input tensor for the first network. However, our API need all of the tensors to be certain when initializing, conflicting with MTCNN.

huningxin commented 5 years ago

conflicting with MTCNN.

Thanks for the investigation. I am wondering how MTCNN implements by using DL frameworks. Do you have any ideas?

Wenzhao-Xiang commented 5 years ago

@huningxin Other DL frameworks like keras support a dynamic input. For example,

None can be arbitrary value. But our API allocate all the tensor when constructing graph, right?

Wenzhao-Xiang commented 5 years ago

After use SSD model for face detector, landmark detection result work better now! The face detection models are SSD_MobilenetV1, SSD_MobilenetV2, SSDLite_MobilenetV2 and Tiny_YoloV2. And face alignment model is DAN(Deep Alignment Network, here is the paper).

I have pushed my code to my github io, please go to https://wenzhao-xiang.github.io/intel/webml-examples/index.html for a simple face detection and alignment image/camera demo.

Single face:

Multiple face: