yinguobing / cnn-facial-landmark

Training code for facial landmark detection based on deep convolutional neural network.
MIT License
626 stars 182 forks source link

How to use the new Keras model to draw facial landmarks #103

Open ksyao2002 opened 3 years ago

ksyao2002 commented 3 years ago

I noticed in your changelog that you updated the models to be based on Keras models instead of tensorflow estimators (for those who are interested in the difference: https://stackoverflow.com/questions/51455863/whats-the-difference-between-a-tensorflow-keras-model-and-estimator#:~:text=Keras%20is%20similar%20to%20the,tensor%20manipulation%20libraries%2C%20or%20backends.) However, the code I was using previously, which is based off of this article (https://towardsdatascience.com/robust-facial-landmarks-for-occluded-angled-faces-925e465cbf2e) did the predictions using the estimator. I trained my own model based off the tutorial in https://github.com/faust690226/cnn-facial-landmark-tutorial, and I am trying to run the following code to give a prediction of the x and y locations of the facial landmarks (this code was directly taken from the article cited above):

    offset_y = int(abs((face[3] - face[1]) * 0.1))
    box_moved = move_box(face, [0, offset_y])
    facebox = get_square_box(box_moved)

    face_img = img[facebox[1]: facebox[3],
                     facebox[0]: facebox[2]]
    face_img = cv2.resize(face_img, (128, 128))
    face_img = cv2.cvtColor(face_img, cv2.COLOR_BGR2RGB)

    # # Actual detection.
    predictions = model.signatures["predict"](
        tf.constant([face_img], dtype=tf.uint8))

    # Convert predictions to landmarks.
    marks = np.array(predictions['output']).flatten()[:136]
    marks = np.reshape(marks, (-1, 2))

    marks *= (facebox[2] - facebox[0])
    marks[:, 0] += facebox[0]
    marks[:, 1] += facebox[1]
    marks = marks.astype(np.uint)

    return marks

This code works fine with the estimator model from the past, but with my new Keras model, this code does not return any marks. I should mention that when I use my model using model.evaluate using a dataset the model has never seen before, it is reaching losses on the order of 10^-5. My question is how do we do predictions and draw marks using this Keras model? Maybe I can use model.predict, but the issue is I have to first convert the data into a record file which takes a bit more time and effort than I would like, and also I'm not sure if the tfrecord would work if I don't give it the landmark points since I will be using the model to predict images that have not been annotated. I was wondering if there was any other way of getting the predictions of the landmark positions from a raw image that was imported using cv2.imread or cv2.VideoCapture.read to read the frames of a video.

yinguobing commented 3 years ago

That's a lot of questions! :rofl: But don't worry, they are not that complicated.

Q1: ..how do we do predictions and draw marks using this Keras model?

If the model is built with Keras and has been correctly loaded, model.predict() is the best way to do prediction. There is no need to convert every input image into tfrecord samples.

Example:

# Restore the model.
model = tf.keras.models.load_model("./exported")

# Read in and preprocess the sample image
img = cv2.imread("/home/robin/Desktop/sample/face.jpg")
img = cv2.resize(img, (256, 256))
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img_input = normalize(np.array(img_rgb, dtype=np.float32))

# Do prediction.
heatmaps = model.predict(tf.expand_dims(img_input, 0))[0]

Full code here: https://github.com/yinguobing/facial-landmark-detection-hrnet/blob/master/predict.py

Q2: .. and also I'm not sure if the tfrecord would work if I don't give it the landmark points since I will be using the model to predict images that have not been annotated.

There is no need to provide labels for predictions. Even if you did, they will be ignored by model.predict.

Q3:I was wondering if there was any other way of getting the predictions of the landmark positions from a raw image that was imported using cv2.imread or cv2.VideoCapture.read to read the frames of a video.

If the model can not be loaded by Keras, or you just don't want to use model.predict, you can try loading the graph only.

Example from the official doc:

model.save("my_model")
tensorflow_graph = tf.saved_model.load("my_model")
x = np.random.uniform(size=(4, 32)).astype(np.float32)
predicted = tensorflow_graph(x).numpy()
ksyao2002 commented 3 years ago

Thanks for your reply! I tried using the code that you linked above:

But I am getting this error:

ValueError: Python inputs incompatible with input_signature: inputs: ( Tensor("IteratorGetNext:0", shape=(None, 128, 128, 3), dtype=uint8)) input_signature: ( TensorSpec(shape=(None, 128, 128, 3), dtype=tf.float32, name='input_1'))

Any ideas?

Edit: some clarifications. The error comes from the model.predict line.

ksyao2002 commented 3 years ago

Update: I fixed it. The reason the error was occuring was because I didn't include the normalization line: img_input = normalize(np.array(img_rgb, dtype=np.float32))

Also, the line heatmaps = np.transpose(heatmaps, (2, 0, 1)) was giving me an error, since the heatmaps object was just a 1D list, but this transpose assumes that it is 3D. I just removed that line, and then drew the circles using the raw heatmaps object (in my case I called it predictions. I'm not sure if skipping the transpose step and also the get_peak_location function changes anything about the output. Below is a snippet of my code:

face_img = normalize(np.array(face_img, dtype=np.float32))

predictions = model.predict(tf.expand_dims(face_img, 0))[0]

# Parse the heatmaps to get mark locations.
#heatmaps = np.transpose(predictions, (2, 0, 1))
i=0
while i<(len(predictions)):
    #mark = get_peak_location(heatmap)
    cv2.circle(imgcpy, (int(predictions[i]*(facebox[2]-facebox[0])+facebox[0]),int(predictions[i+1]*(facebox[3]-facebox[1])+facebox[1])), 1, (255, 255, 255), -1)
    i+=2

The result unfortunately is not too great. Picture shown below (green is pretrained from the article linked above, white is my own trained model).

image

It may be an issue with my training, or maybe it's an issue with not using the get_peak_location function?

yinguobing commented 3 years ago

First, it seems that you had figure out how to use numpy array (cv2.imread, cv2.VideoCapture) as TensorFlow's model input. :+1: I guess this issue is safe to be closed.

Second, the two lines of code you skipped is designed for HRNet, which generates mark heatmaps rather than raw mark locations. Different models tend to have different means for post processing, and It is up to the model authors.

The last question is about model performance. If you are using the training code from this repo, I need to remind you this is a really simple model that do not have performance as it's priority concern. Due to limited access I can not read the full article you linked above but a blind guess would be the author used a more mature model at least. If performance this is your next concern then I think it's time to move on and leave this repo behind.

ksyao2002 commented 3 years ago

Thanks for your responses, I really appreciate all the work you've been doing! Do you think using ResNet50 as the model architecture would achieve better performance?

yinguobing commented 3 years ago

Maybe. Model architecture is only part of the whole deep learning system. Better performance requires balanced combination of data, model, loss function, optimizer, training strategy, etc. You can find plenty of articles on this topic online.