mpatacchiola / deepgaze

Computer Vision library for human-computer interaction. It implements Head Pose and Gaze Direction Estimation Using Convolutional Neural Networks, Skin Detection through Backprojection, Motion Detection and Tracking, Saliency Map.
MIT License
1.78k stars 479 forks source link

Accuracy of deepgaze #96

Closed santo4ul closed 4 years ago

santo4ul commented 4 years ago

Hi @mpatacchiola ,

I integrated webcam to the sample application and I see that the yaw,pitch and roll values are offset by some degrees.

I get the below values when I keep my head straight to my camera. When my head is straight to the camera, I expect values close to zero on all the angles.

Roll  = 12.158
Pitch  = -22.6817
Yaw  = -11.6264

The above three axes, when plotted looks like this,

image

I'm using draw_axis() function from here to draw the above shown 3 axis

Questions:

  1. What could be the possible reasons for such behavior?
  2. My camera is 640x480 resolution. Could this be because of the quality of the final 64x64 image that we feed to deepgaze network?
  3. The CNN output is in range -1 to +1 (based on the comments in the code). Based on this, the supported ranges are,

    Pitch - +45deg to -45deg Yaw - +100deg to -100deg Roll - +45deg to -100deg I my understanding correct?

  4. In head_pose_estimation.py, The value multiplied is 25 and it is not matching with the comment, which says +45 to -45. Should we be using 45 or the comment needs to change? roll_vector = np.multiply(roll_raw, 25.0) #cnn-out is in range [-1, +1] --> [-45, + 45]
  5. Any other suggestions to improve the accuracy?

Thank you.

santo4ul commented 4 years ago

I have an update. What I reported above was from OpenCV DNN module in CPP. The issue I reported is not seen with OpenCV DNN in Python. So there is something happening with OpenCV DNN in CPP. I'll debug this further.

However, I still have the below question

  1. In head_pose_estimation.py, The value multiplied is 25 and it is not matching with the comment, which says +45 to -45. Should we be using 45 or the comment needs to change? roll_vector = np.multiply(roll_raw, 25.0) #cnn-out is in range [-1, +1] --> [-45, + 45]

Thank you.

mpatacchiola commented 4 years ago

Hi @santo4ul

Yes the roll is in [-25, +25] and the comment will be updated, thank you for pointing that out.

santo4ul commented 4 years ago

Thanks @mpatacchiola

I was wrong with my previous observation. Originally I had issues matching the 1.jpg, 2.jpg, etc. test images between CPP and Python.

Later I found a problem with the cv::resize() function in my CPP version. Now I have both my OpenCV DNN CPP and OpenCV DNN Python giving same outputs for all the test images in (examples/ex_cnn_head_pose_axes)

After looking at it carefully, even my Python version has the same error in angles.

Then I switched to the deepgaze python example examples/ex_cnn_head_pose_axes and modified it to read frames from webcam. Given below is the modified examples/ex_cnn_head_pose_axes, that reads frames from webcam and draws the 3 angles. I've used MTCNN face detector in the below given example code for face detection.


#!/usr/bin/env python

#The MIT License (MIT)
#Copyright (c) 2016 Massimiliano Patacchiola
#
#THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
#MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY 
#CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 
#SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

#In this example the Deepgaze CNN head pose estimator is used to get the YAW angle.
#The angle is projected on the input images and showed on-screen as a red line.
#The images are then saved in the same folder of the script.

import numpy as np
import os
import tensorflow as tf
import cv2
from deepgaze.head_pose_estimation import CnnHeadPoseEstimator
from mtcnn import MTCNN
import math
from math import cos, sin

detector = MTCNN(min_face_size = 70, scale_factor = 0.79)

sess = tf.Session() #Launch the graph in a session.
my_head_pose_estimator = CnnHeadPoseEstimator(sess) #Head pose estimation object
# Load the weights from the configuration folders
my_head_pose_estimator.load_yaw_variables(os.path.realpath("../../etc/tensorflow/head_pose/yaw/cnn_cccdd_30k.tf"))
my_head_pose_estimator.load_roll_variables(os.path.realpath("../../etc/tensorflow/head_pose/roll/cnn_cccdd_30k.tf"))
my_head_pose_estimator.load_pitch_variables(os.path.realpath("../../etc/tensorflow/head_pose/pitch/cnn_cccdd_30k.tf"))

cap = cv2.VideoCapture(0) #Using the default camera

def draw_axis(img, yaw, pitch, roll, tdx=None, tdy=None, size = 100):

    pitch = pitch * np.pi / 180
    yaw = -(yaw * np.pi / 180)
    roll = roll * np.pi / 180

    if tdx != None and tdy != None:
        tdx = tdx
        tdy = tdy
    else:
        height, width = img.shape[:2]
        tdx = width / 2
        tdy = height / 2

    # X-Axis pointing to right. drawn in red
    x1 = size * (cos(yaw) * cos(roll)) + tdx
    y1 = size * (cos(pitch) * sin(roll) + cos(roll) * sin(pitch) * sin(yaw)) + tdy

    # Y-Axis | drawn in green
    #        v
    x2 = size * (-cos(yaw) * sin(roll)) + tdx
    y2 = size * (cos(pitch) * cos(roll) - sin(pitch) * sin(yaw) * sin(roll)) + tdy

    # Z-Axis (out of the screen) drawn in blue
    x3 = size * (sin(yaw)) + tdx
    y3 = size * (-cos(yaw) * sin(pitch)) + tdy

    cv2.line(img, (int(tdx), int(tdy)), (int(x1),int(y1)),(0,0,255), 2, cv2.LINE_AA)
    cv2.line(img, (int(tdx), int(tdy)), (int(x2),int(y2)),(0,255,0), 2, cv2.LINE_AA)
    cv2.line(img, (int(tdx), int(tdy)), (int(x3),int(y3)),(255,0,0), 2, cv2.LINE_AA)

while(True):
    ret, image = cap.read()

    #Detect face using MTCNN
    result = detector.detect_faces(image)

    if len(result) == 0:
        continue #No face detected, continue

    box = result[0]['box'] #Take the first face

    x = box[0]
    y = box[1]
    w = box[2]
    h = box[3]

    face = image[y:y+h, x:x+w] 

    face = cv2.resize(face, (64,64), interpolation=cv2.INTER_AREA)

    roll_degree = my_head_pose_estimator.return_roll(face, radians=False)  # Evaluate the roll angle using a CNN
    pitch_degree = my_head_pose_estimator.return_pitch(face, radians=False)  # Evaluate the pitch angle using a CNN
    yaw_degree = my_head_pose_estimator.return_yaw(face, radians=False)  # Evaluate the yaw angle using a CNN
    print("Estimated [roll, pitch, yaw] (degrees) ..... [" + str(roll_degree[0,0,0]) + "," + str(pitch_degree[0,0,0]) + "," + str(yaw_degree[0,0,0])  + "]")

    cv2.rectangle(image,
                  (int(box[0]), int(box[1])),
                  (int((box[0]+box[2])), int((box[1] + box[3]))),
                  (0,155,255),
                  2)

    draw_axis(image, yaw_degree, pitch_degree, roll_degree, tdx=box[0], tdy=box[1], size = 100)

    cv2.imshow('64x64 face',face) # Show the input we passed to deepgaze
    cv2.imshow('Input Image',image)
    if cv2.waitKey(1) == 27:
        break
Steps to run the above example
  1. cd examples/ex_cnn_head_pose_axes
  2. Save the above example to deepgaze_webcam.py
  3. git clone https://github.com/ipazc/mtcnn
  4. export PYTHONPATH=mtcnn:../../
  5. python deepgaze_webcam.py

Given below is the illustration of the output angle of the above code, when the head is kept straight to the camera

image

Given that the OpenCV DNN part is ruled out and the issue is seen with Tensorflow python itself (with the original model of deepgaze), I'm not sure what could be the problem.

Or is the model over-fitting to the training data? Do we need to re-train to fix this issue?

Please suggest how to take this forward.

Thank you.

santo4ul commented 4 years ago

One more observation.

When I move away from the camera (keeping the head straight to the camera), all the angles are varying and slowly converging towards 0 degrees.

mpatacchiola commented 4 years ago

hi @santo4ul

It looks like the frame you are passing to the network is not correctly centered. Probably MTCNN is returning a face-frame which is different when you are close to the camera, respect to when you are far from the camera.

Deepgaze expects the face images to be well centered. My suggestion is to try to pass to Deepgaze the images you find in this example folder, which have been centered in a way compatible with what the CNN would expect.

If your code works on those images then the problem is definitely the MTCNN face detector. If your code does not work on those images then there is a bug somewhere.

santo4ul commented 4 years ago

Hi @mpatacchiola,

I think there is a little bit of confusion here. The image shown is below is not the one I'm feeding to deepgaze. I just cropped using image editor so as to focus the axes.

image

What I'm feeding to depgaze is indeed well centered images. Given below is the 64x64 face image that I'm feeding to deepgase.

image

To the see the problem yourself, you can quickly try the test application that I shared before using a webcam. It won't take more than 5 minutes!

mpatacchiola commented 4 years ago

Even if the image may look well centered it does not mean it will work straight away. For instance, it seems that the image you are using was in a rectangular format and it has been reshaped to be a 64x64 square, introducing some artifact.

The best thing you can do is to try one of the test images I suggested. This way you can understand where is the issue.

santo4ul commented 4 years ago

Sure @mpatacchiola. Let me check and see if I can adjust the bounding box to keep the boxes square before resizing.