uricamic / clandmark

Open Source Landmarking Library
http://cmp.felk.cvut.cz/~uricamic/clandmark
GNU General Public License v3.0
199 stars 111 forks source link

Multi-view face landmark extraction in Python #87

Closed yurinativo closed 6 years ago

yurinativo commented 6 years ago

Hi,

Thank you for your greak work in this project. I'm trying to implement the "Multi-view face landmark extraction" in Python. I build the Python Interface with success on Anaconda Python 3.5 and Opencv 3.4 and could run the example withou problem. I'm trying to detect the face yaw using those JOINT xml files. But all of the 5 detections have found points on each face. I haven't realize yet how to decide each detection have the best result or precision. I have looked for sample on Snipped codes provided by you, but I didn't notice where this decision is take.

This is your implementation with a few modifications to work with multiple joint detections. What am I missing?

import time
import numpy as np
import cv2

from py_flandmark import PyFlandmark
from py_featurePool import PyFeaturePool

def rgb2gray(rgb):
    return np.dot(rgb[...,:3], [0.299, 0.587, 0.144])

models = "C:/local/clandmark/install/share/clandmark/models/"
flandmarks = [PyFlandmark(models + "PART_fixed_JOINTMV_FRONTAL.xml", False)
            , PyFlandmark(models + "PART_fixed_JOINTMV_HALF-PROFILE.xml", False)
            , PyFlandmark(models + "PART_fixed_JOINTMV_-HALF-PROFILE.xml", False)
            , PyFlandmark(models + "PART_fixed_JOINTMV_PROFILE.xml", False)
            , PyFlandmark(models + "PART_fixed_JOINTMV_-PROFILE.xml", False)]
colors = [(0, 0, 255), (255, 0, 255), (255, 255, 0), (255, 0, 0), (0, 255, 0)]

bw = flandmarks[0].getBaseWindowSize()
featurePool = PyFeaturePool(bw[0], bw[1], None)
featurePool.addFeatuaddSparseLBPfeatures()

for flandmark in flandmarks:
    flandmark.setFeaturePool(featurePool)

cascPath = models + "haarcascade_frontalface_alt.xml"
faceCascade = cv2.CascadeClassifier(cascPath)

video_capture = cv2.VideoCapture("matrix2.mp4")
while True:
    ret, frame = video_capture.read()
    if frame is None:
        break

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    arr = rgb2gray(frame)

    faces = faceCascade.detectMultiScale(
        gray,
        scaleFactor=1.1,
        minNeighbors=5,
        minSize=(30, 30),
        flags=cv2.CASCADE_SCALE_IMAGE
    )

    # Draw a rectangle around the faces
    for (x, y, w, h) in faces:
        bbox = np.array([x, y, x+w, y+h], dtype=np.int32)   
        bbox = bbox.reshape((2,2), order='F')   

        start_time = time.time()
        Ps = []
        for j in range(len(flandmarks)):
            Ps.append(flandmarks[j].detect_optimized(arr, bbox))
        print("Elapsed time: %s ms" % ((time.time() - start_time) * 1000))

        for j in range(len(Ps)):
            P = Ps[j]
            for i in range(0, len(P[0,:])-1):
                cv2.circle(frame, (int(round(P[0,i])), int(round(P[1,i]))), 1, colors[j], 2)
        cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)

    # Display the resulting frame
    cv2.imshow('CLandmark - webcam input', frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

video_capture.release()
cv2.destroyAllWindows()
uricamic commented 6 years ago

Hi @yurinativo,

excellent question. Thanks a lot for asking it. I have totally overlooked this in the Python interface.

If you look at the C++ example you will see the answer to your question and also what is missing in the Python interface now.

The detector returns also a score (the higher the score, the better the fit of landmarks), the multi-view model is learned in such a way that this score can be used to determine the viewpoint. The viewpoint detection is done simply by taking the model which maximizes this score.

If you wanna a quick fix, simply change this line to the following one:

return landmarks, self.thisptr.getScore()

and add the following line here:

fl_double_t getScore()

and recompile the python interface. Your script will then need a minor modification of this line Ps.append(flandmarks[j].detect_optimized(arr, bbox)) where you simply first call the detect_optimized function and store both landmarks and score of each viewpoint. Then by getting the argmax over the array containing scores you will have the id of the "correct" viewpoint.

I will apply this fix as well, it might just take some time (do not have access to the repo sources on this computer now).

Hope I described it clearly, if you have any questions, please do not hesitate to ask further ;-)

yurinativo commented 6 years ago

Thanks Michal,

This solution worked perfectly!

uricamic commented 6 years ago

Hi @yurinativo,

no problem ;-)