mpatacchiola / deepgaze

Computer Vision library for human-computer interaction. It implements Head Pose and Gaze Direction Estimation Using Convolutional Neural Networks, Skin Detection through Backprojection, Motion Detection and Tracking, Saliency Map.
MIT License
1.79k stars 478 forks source link

Visualize roll, pitch, yaw in image #43

Closed AVCarreiro closed 6 years ago

AVCarreiro commented 6 years ago

Hi @mpatacchiola, great work, it has been fun trying your examples!

Regarding the CNN implementation, we get the roll, pitch and yaw values, and they seem OK, but I tried to visualize (as you do projecting the axis in the pnp examples) with no success so far. I tried different ways of getting the rotation vector (or matrix and then cv2.Rodrigues to get the vector) from the values, but none seems to work. I don't know if it's a difference in the coordinate system you use or other detail I'm missing.

Any help with this would be most appreciated :)

Best regards,

mpatacchiola commented 6 years ago

Hi @AVCarreiro

I am glad you enjoyed the examples! Regarding the projection of the axes for the angles returned by the CNN, there are not examples available.

If you can share the code (or pseudocode) that you used, I can try to help you. What I would do is to create a method to get a rotation matrix from Euler angles, then I would project the three axes on the image using the cv2.projectPoints() method.

AVCarreiro commented 6 years ago

Thanks for replying @mpatacchiola

I thought of the same process but I'm having some trouble getting the correct rotation matrix. I tried several methods, including different orderings of roll, pitch yaw in the following methods:

x: roll - bank angle z: yaw - bearing/heading y: pitch - elevation/attitude

`
def euler2rotmat(x, y, z):
ch = np.cos(z) sh = np.sin(z) ca = np.cos(y) sa = np.sin(y) cb = np.cos(x) sb = np.sin(x)

rot = np.zeros((3,3), 'float32')
rot[0][0] = ch * ca
rot[0][1] = sh*sb - ch*sa*cb
rot[0][2] = ch*sa*sb + sh*cb
rot[1][0] = sa
rot[1][1] = ca * cb
rot[1][2] = -ca * sb
rot[2][0] = -sh * ca
rot[2][1] = sh*sa*cb + ch*sb
rot[2][2] = -sh*sa*sb + ch*cb

return rot`

which I found in the OpenCV code, or the following, found in LearnOpenCV (

` def eulerAnglesToRotationMatrix(theta) :

R_x = np.array([[1,         0,                  0                   ],
                [0,         math.cos(theta[0]), -math.sin(theta[0]) ],
                [0,         math.sin(theta[0]), math.cos(theta[0])  ]
                ])         

R_y = np.array([[math.cos(theta[1]),    0,      math.sin(theta[1])  ],
                [0,                     1,      0                   ],
                [-math.sin(theta[1]),   0,      math.cos(theta[1])  ]
                ])

R_z = np.array([[math.cos(theta[2]),    -math.sin(theta[2]),    0],
                [math.sin(theta[2]),    math.cos(theta[2]),     0],
                [0,                     0,                      1]
                ])                     

R = np.dot(R_z, np.dot( R_y, R_x ))

return R`

Then, I'd use rvec, jacobian = cv2.Rodrigues(rot_mat) to get the rotation vector, and project the axis points using imgpts, jac = cv2.projectPoints(axis, rvec, tvec, camera_matrix, camera_distortion) as in your examples, with tvec = [0, 0, 0], although I'm not sure about this translation, and also tried different values. The plotting then has origin in a different point (other than the sellion you use because now we don't have the facil landmarks, it's just the center of the image or of the cropped face)

mpatacchiola commented 6 years ago

Hi @AVCarreiro

Sorry for the late reply but this issue totally slipped my mind. The procedure you are following is correct, however there are probably some errors in the coordinate systems. Conversion to different coordinate frames does not make our life easier sometimes. A crucial point here is the following:

OpenCV uses the right-hand coordinate system. With respect to the optical axis of the camera, the X goes right, Y goes downward and Z goes forward.

Another important thing you must consider is that you have to use radians instead of degrees. By default deepgaze returns degrees and you have to explicitly ask for radians. For instance, to get the yaw angle in radians:

yaw = my_head_pose_estimator.return_yaw(image, radians=True)

To simplify the problem I will show you how to print the yaw on your image, you can extend it to the other axes easily. First of all you need to get the parameters of the camera, in my example I will estimate it directly from the image:

file_name = "image.jpg"
file_save = "image_axes.jpg"
print("Processing image ..... " + file_name)
image = cv2.imread(file_name)
cam_w = image.shape[1]
cam_h = image.shape[0]
c_x = cam_w / 2
c_y = cam_h / 2
f_x = c_x / np.tan(60/2 * np.pi / 180)
f_y = f_x

camera_matrix = np.float32([[f_x, 0.0, c_x], [0.0, f_y, c_y], [0.0, 0.0, 1.0]])

print("Estimated camera matrix: \n" + str(camera_matrix) + "\n")

Ok, now I will define the axis I want to print, in my case only the axis that represents the Yaw. I also define a translation vector, were the axis is moved a little bit along the depth direction. Then I get the Yaw angle through the CNN and deepgaze, specifying that I need the angle in radians:

axis = np.float32([[0.0, 0.0, 0.0], 
                             [0.0, 0.0, 0.0], 
                             [0.0, 0.0, 0.5]])

tvec = np.array([0.0, 0.0, 1.0], np.float) # translation vector

yaw = my_head_pose_estimator.return_yaw(image, radians=True)  # Evaluate the yaw angle

Now you can use the function that you reported above. Here I modified it in order to consider only the yaw angle:

def yaw2rotmat(yaw):
    x = 0.0
    y = 0.0
    z = yaw
    ch = np.cos(z)
    sh = np.sin(z)
    ca = np.cos(y)
    sa = np.sin(y)
    cb = np.cos(x)
    sb = np.sin(x)
    rot = np.zeros((3,3), 'float32')
    rot[0][0] = ch * ca
    rot[0][1] = sh*sb - ch*sa*cb
    rot[0][2] = ch*sa*sb + sh*cb
    rot[1][0] = sa
    rot[1][1] = ca * cb
    rot[1][2] = -ca * sb
    rot[2][0] = -sh * ca
    rot[2][1] = sh*sa*cb + ch*sb
    rot[2][2] = -sh*sa*sb + ch*cb
    return rot

You can pass the yaw angle returned by Deepgaze to your function, but you have to invert the sign because the convention of Deepgaze for the yaw is different. After that you apply the Rodrigues and the projection:

rot_matrix = yaw2rotmat(-yaw[0,0,0])
rvec, jacobian = cv2.Rodrigues(rot_matrix)
imgpts, jac = cv2.projectPoints(axis, rvec, tvec, camera_matrix, camera_distortion)

Now, in order to make our life easier I define the two points we want to draw on the image. I take the center of the image as starting point, and the point returned by the projection as ending point:

p_start = (int(c_x), int(c_y))
p_stop = (int(imgpts[2][0][0]), int(imgpts[2][0][1]))

Finally I draw the line using the two points (red line) and the origin (green dot), then I save the image:

cv2.line(image, p_start, p_stop, (0,0,255), 3) #Red
cv2.circle(image, p_start, 1, (0,255,0), 3) #Green
cv2.imwrite(file_save, image)

This code works on my system and it correctly draw a line representing the yaw. You should be able to extend the code to other angles following the same workflow.

dhawan85 commented 6 years ago

hi @mpatacchiola

Please can you also give sample code for calculating roll, pitch. i am really stuck on this issue of calculating roll, pitch and yaw of an image. i am also using python with open cv

mpatacchiola commented 6 years ago

hi @dhawan85

What do you mean exactly for "calculating roll, pitch" ? Deepgaze already returns those values.

massvoice commented 6 years ago

hi @mpatacchiola ,

Can you please help to calculate the roll, pitch and yaw values from the webcam stream. I tried it by giving the input to the ex_cnn_head_pose_estimation_images.py via my webcam, but it is not at all accurate and consistent, since I am not taking in account the focal length and face detection in consideration for the same. It will be extremely helpful if you can help me in modifying the ex_pnp_head_pose_estimation_webcam.py to calculate the roll, pitch and yaw values in real time.

Thank you for the great work.

humphreyBristol commented 6 years ago

@AVCarreiro Sorry, could you please help me that when I run the pnp_head_pose_estimation code, it always said that the image type is wrong, while actually I was using RGB pics. Since you said you have got the result and the code works fine, so could you please give me some suggestions?

abhisheksaurabh1985 commented 6 years ago

Hi @mpatacchiola I'm working on a head pose detection system for car drivers. I've come up with a very basic system using dlib and OpenCV. After I obtain the values of yaw, pitch and roll (YPR), I want to classify the direction in which the driver is looking at (for e.g. left, right, road or instrument cluster), depending on the values of the YPR. However, I find that the values change abruptly and there is no definite pattern. I also tried deep gaze for this purpose, but the problem still persists.

Could you please give me any suggestions on how to make the values more consistent? The pipeline is as follows: Face detection, facial landmark detection and head pose orientation detection.

mpatacchiola commented 6 years ago

Hi @abhisheksaurabh1985

This problem can be solved using a filter. For instance a moving average or a Bayes filter. The idea is that the head pose at time t is strongly correlated witht the position at time t-1. In other words,if your yaw is at 65 degrees, it cannot suddently jump to -85 degrees. When the measure returned is inconsisted the filter will smooth it and the result will be more coherent.

mpatacchiola commented 6 years ago

Closed for inactivity

pavithran-s commented 4 years ago

Hai @mpatacchiola

I changed the code to plot pitch also. But that one is not working. I have enclosed my code also.

def yaw2rotmat_yaw(yaw): x = 0.0 y = 0.0 z = yaw ch = np.cos(z) sh = np.sin(z) ca = np.cos(y) sa = np.sin(y) cb = np.cos(x) sb = np.sin(x) rot = np.zeros((3,3), 'float32') rot[0][0] = ch ca rot[0][1] = shsb - chsacb rot[0][2] = chsasb + shcb rot[1][0] = sa rot[1][1] = ca cb rot[1][2] = -ca sb rot[2][0] = -sh ca rot[2][1] = shsacb + chsb rot[2][2] = -shsasb + chcb return rot def yaw2rotmat_pitch(pitch): x = 0.0 y = pitch z = 0.0 ch = np.cos(z) sh = np.sin(z) ca = np.cos(y) sa = np.sin(y) cb = np.cos(x) sb = np.sin(x) rot = np.zeros((3,3), 'float32') rot[0][0] = ch ca rot[0][1] = shsb - chsacb rot[0][2] = chsasb + shcb rot[1][0] = sa rot[1][1] = ca cb rot[1][2] = -ca sb rot[2][0] = -sh ca rot[2][1] = shsacb + chsb rot[2][2] = -shsasb + chcb return rot def detect(gray, frame): faces = face_cascade.detectMultiScale(gray, 1.3, 5) for (x, y, w, h) in faces: roi_color = frame[y:y+h, x:x+w] return roi_color

def check_pose(image,counter): if(image.shape[0]<64): a=65 else: a=image.shape[0] image=cv2.resize(image,(a,a)) import time time1 = time.time() cam_w = image.shape[1] cam_h = image.shape[0] c_x = cam_w / 2 c_y = cam_h / 2 f_x = c_x / np.tan(60/2 * np.pi / 180) f_y = f_x camera_matrix = np.float32([[f_x, 0.0, c_x], [0.0, f_y, c_y], [0.0, 0.0, 1.0] ])

Distortion coefficients

camera_distortion = np.float32([0.0, 0.0, 0.0, 0.0, 0.0])
#Defining the axes
axis = np.float32([[0.0, 0.0, 0.0],
                   [0.0, 0.0, 0.0],
                   [0.0, 0.0, 0.5]])
pitch = my_head_pose_estimator.return_pitch(image, radians=True)  # Evaluate the pitch angle using a CNN
yaw = my_head_pose_estimator.return_yaw(image, radians=True)  # Evaluate the yaw angle using a CNN
rot_matrix_yaw = yaw2rotmat_yaw(-yaw[0,0,0]) #Deepgaze use different convention for the Yaw, we have to use the minus sign
rot_matrix_pitch = yaw2rotmat_pitch(-pitch[0, 0, 0])

rvec_yaw, jacobian = cv2.Rodrigues(rot_matrix_yaw)
rvec_pitch, jacobian = cv2.Rodrigues(rot_matrix_pitch)

tvec = np.array([0.0, 0.0, 1.0], np.float) # translation vector

imgpts_yaw, jac = cv2.projectPoints(axis, rvec_yaw, tvec, camera_matrix, camera_distortion)
imgpts_pitch, jac = cv2.projectPoints(axis, rvec_pitch, tvec, camera_matrix, camera_distortion)

p_start = (int(c_x), int(c_y))
yaw_stop = (int(imgpts_yaw[2][0][0]), int(imgpts_yaw[2][0][1]))
pitch_stop= (int(imgpts_pitch[2][0][0]), int(imgpts_pitch[2][0][1]))

distance = abs(p_start[0]-yaw_stop[0])
#print("time",(time.time()-time1))
cv2.line(image, p_start, yaw_stop, (0,0,255), 5) #RED
cv2.line(image, p_start, pitch_stop, (255, 0, 0), 5)
cv2.circle(image, p_start, 1, (0,255,0), 5) #GREEN