karanvivekbhargava / obamanet

ObamaNet : Photo-realistic lip-sync from audio (Unofficial port)
MIT License
235 stars 71 forks source link

how to normalize the keypoints in this paper? #4

Closed xiaoyun4 closed 6 years ago

xiaoyun4 commented 6 years ago

I am interested in your project and read your paper. In your paper, you process the keypoints to be independent on the face location, face size, in-plane and out-of-plane face rotation. For example, you mean-normalization the 68 keypoints, project the keypoints into a horizontal axis and divide the keypoints by the norm of the 68 vectors. It is important for this processing, but the explanation is brief in your paper. Could you explain this process including the formulas and methods in detail?

Thanks a lot

karanvivekbhargava commented 6 years ago

Hey @xiaoyun4 ,

Glad to answer the question. So, there are three steps happening in the preprocessing.

  1. Subtract the mean of the mouth keypoints away from all the keypoints - This helps in offsetting the keypoints such that the mouth is always at the origin.
  2. Determine the angle of the face - So, now we have the keypoints which are mouth centered. We can use different methods to determine the tilt of the face. You might use the nose keypoints to get the vertical, which is a quick and dirty way. This is the one I've used.
  3. Use the angle used above to remove the in plane rotation of the face. This is a simple rotation of the points by the angle.
  4. Now we have the points which are mouth centered and in plane tilt removed. We would like to make these points independant of face size. For this reason, we normalize the points with the norm of all the keypoints.
  5. PCA (I wouldn't go much into this part because @xiaoyun4 didn't ask for this)

I hope this gives you some details to work on.

hanezu commented 5 years ago

@karanvivekbhargava Thanks for your answer. From your code, I figure out that for the step 2 you used eyes keypoints to get the horizontal line instead of the nose keypoints. Is that what you intended to say?

karanvivekbhargava commented 5 years ago

I had experimented with both and it seems that I used the eyes instead of the nose. @hanezu that seems to be spot on.

chengdazhi commented 5 years ago

@hanezu @karanvivekbhargava Where in code can I see the implementation of rotation normalization? I can only find that in run.py, it loads a pre-generated tilt angle and use it to de-normalize the predicted landmarks...