hysts / pytorch_mpiigaze

An unofficial PyTorch implementation of MPIIGaze and MPIIFaceGaze
MIT License
349 stars 87 forks source link

How to find face? #26

Closed Kelly-ZH closed 3 years ago

Kelly-ZH commented 3 years ago

Hello hysts, I have some questions when I run the code. It would be great pleasure if you can reply me. 1. Is it use dlib to detect face, that is, face can be detected if 64 face landmarks are found in small boxes? 2. What does "normalized_camera_distance" in configs/demo_mpiigaze_resnet.yaml mean? Thank you very much, I'am look forward to your reply. Yours, Kelly

hysts commented 3 years ago

Hi, @Kelly-ZH

  1. I'm not entirely sure I understand your question, but, yes, dlib is used for face detection and landmark estimation in this repository. Face landmarks are used to compute face orientation which is used to normalize face image. Here is the code for face detection.

  2. It's dn in section 4.2 of this paper. It is defined to be 600mm, so I set it to 0.6m.

Kelly-ZH commented 3 years ago

Hi, @Kelly-ZH

  1. I'm not entirely sure I understand your question, but, yes, dlib is used for face detection and landmark estimation in this repository. Face landmarks are used to compute face orientation which is used to normalize face image. Here is the code for face detection.
  2. It's dn in section 4.2 of this paper. It is defined to be 600mm, so I set it to 0.6m.

Thank you for your prompt reply, it helps a lot for me. About Question 1,I got it and have more questions. That is in the reference paper, author established a generic face model according to the dataset, and compared the input image of 6 landmarks with the model, then compute 3D head rotation and eye location t by EPnP algorithm. Where can I find these parts in your codes? Or How do you to achieve it?

hysts commented 3 years ago

@Kelly-ZH

As I couldn't find the 6-point 3D face model they used, I decided to use another model. I think I made it using this dataset, but it's quite a long ago, I forgot the exact process. Model fitting is done here.

Kelly-ZH commented 3 years ago

@Kelly-ZH

As I couldn't find the 6-point 3D face model they used, I decided to use another model. I think I made it using this dataset, but it's quite a long ago, I forgot the exact process. Model fitting is done here.

Thank you for your reply, your reply really helped me to understand the code better. My understanding is : first detect face and create 2D face landmarks coordinates by dilib , second use Multi-PIE dataset to create 68 points 3D face model , finally through the EPNP algorithm, 68 2D coordinates were matched with the corresponding 3D average face model, so the corresponding head rotation angle and eye location in the camera coordinate system are obtained. If there is any misunderstanding, please understand, I am just beginning to learn computer vision. My question is how to get Multi-PIE 3D face model? I know you may forgot the exact process, but I ask this question anyway. Thank your for your patient answer.

hysts commented 3 years ago

@Kelly-ZH

I think you understood the code correctly. For the 3D face model, you can predict 68 3D face landmarks using this repo, so I think you can get a 3D model like this:

Kelly-ZH commented 3 years ago

@Kelly-ZH

I think you understood the code correctly. For the 3D face model, you can predict 68 3D face landmarks using this repo, so I think you can get a 3D model like this:

  • Prepare some face images.
  • Predict 3D landmarks using the repo.
  • Align the obtained landmarks. (For example, by two eye center and nose landmarks)
  • Average those landmarks.
  • Shift, rotate, and scale the landmarks so that the conditions described here are satisfied.

Thank you for your answer. I will try it latter. Now I have trouble in convert gaze vectors to a gaze point on 2D computer screen. do you have made it or any good ideas? Thank you very much.

Kelly-ZH commented 3 years ago

Hello hysts, I have another question. Eye.center define herehttps://github.com/hysts/pytorch_mpiigaze/blob/9943de9ed3f2217f0c423d28baf2585efc8b4c8a/gaze_estimation/gaze_estimator/common/face_model.py#L148 is the rough 3D location of the eye center in the camera coordinate system or in the head coordinate system? I am looking forward to your answer.

hysts commented 3 years ago

@Kelly-ZH

I think it's in the camera coordinate system.

Kelly-ZH commented 3 years ago

Hi hysts, I looked the code carefully, and I have several new questions, I really hope you can answer the questions.

  1. Why is normalized_head_pose here multiplied by NP.array ([1, -1]), and the CNN network model output normalized_gaze_angles here multiplied by NP.array ([1, -1]) again?
  2. What does this function do? Does it change face model coordinate system to real camera coordinate system?
  3. I find the face model coordinate system defined here different from it defined in the paper, and is there any reference to the face model coordinate system?
  4. Why self.normalized_gaze_vector multiplied by normalizing_rot here instead of the inverse of normalizing_rot to denormalize gaze_vector? Thank you very much!
hysts commented 3 years ago

@Kelly-ZH

  1. The MPIIGaze model takes the left eye image and the head pose as input, so when you put a right eye image and a head pose into the model, you need to flip both of them and also need to flip back the resulting gaze vector.

  2. It just flips the X and Z axes of the coordinate system. It's only used to display the head pose angles, so you don't really need to care about it. FYI, the reason I used this function was that I thought it'd be unintuitive to have negative pitch when the head is looking up. But, maybe I shouldn't have done it.

  3. I'm confused. What's the difference? The origin? As we only need head rotation angles, as long as the definitions of the direction of the axes are the same, it doesn't matter.

  4. Here the gaze vector is a row vector and rotation matrices are orthogonal, so multiplying rotation matrix from the right is equivalent to multiplying its inverse to the column gaze vector from the left.

Kelly-ZH commented 3 years ago

@Kelly-ZH

  1. The MPIIGaze model takes the left eye image and the head pose as input, so when you put a right eye image and a head pose into the model, you need to flip both of them and also need to flip back the resulting gaze vector.
  2. It just flips the X and Z axes of the coordinate system. It's only used to display the head pose angles, so you don't really need to care about it. FYI, the reason I used this function was that I thought it'd be unintuitive to have negative pitch when the head is looking up. But, maybe I shouldn't have done it.
  3. I'm confused. What's the difference? The origin? As we only need head rotation angles, as long as the definitions of the direction of the axes are the same, it doesn't matter.
  4. Here the gaze vector is a row vector and rotation matrices are orthogonal, so multiplying rotation matrix from the right is equivalent to multiplying its inverse to the column gaze vector from the left.

Thank you for your detailed reply, I understand , and about 3, I think you are right, they have the same definition of the X、Y、Z axis except for the origin. Now I have confused about the pitch and yaw, Why my gaze angle_pitch always negative?

hysts commented 3 years ago

@Kelly-ZH

Hmm... That's weird. As described in the paper, the distribution of gaze vectors in the MPIIGaze dataset is very biased due to the way the data are collected, so that may be the reason for the pitch. But I have no idea about what's going on with the yaw.

Kelly-ZH commented 3 years ago

@Kelly-ZH

Hmm... That's weird. As described in the paper, the distribution of gaze vectors in the MPIIGaze dataset is very biased due to the way the data are collected, so that may be the reason for the pitch. But I have no idea about what's going on with the yaw.

Thank you for your reply, I will see the MPIIGaze dataset carefully.