penincillin / DREAM

This is the public repository for our accepted CVPR 2018 paper "Pose-Robust Face Recognition via Deep Residual Equivariant Mapping"
http://mmlab.ie.cuhk.edu.hk/projects/DREAM/
BSD 2-Clause "Simplified" License
388 stars 98 forks source link

Will you release your training dataset(subset of MS-Celeb-1M that consists of 13385 identities)? #3

Closed xiaoxingzeng closed 6 years ago

xiaoxingzeng commented 6 years ago

Will you publicate your training dataset? and in this training dataset already remove the overlaps between IJB-A? Thanks

penincillin commented 6 years ago

Sorry, these images are belong to sensetime and I could't release them. If you want to train your own face verification model. I recommend you check the DeepInsight, they provide the code of whole pipeline from face detection to recognition. Besides, they also provide downloading of preprocessed MS-Celeb-1M and VGG-2

xiaoxingzeng commented 6 years ago

@penincillin ,Thanks, I have already cleanning MS-Celeb-1M and get the yaw of images. When I use end2end training, the yaw(radian units) should be transformed to sigmoid(4/pi*y-1)?

penincillin commented 6 years ago

Actually, you could find the implementation of yaw transformation in src/stiching/branch_util.py

KangolHsu commented 6 years ago

@penincillin How did you get the yaw of images?

lyatdawn commented 6 years ago

@xiaoxingzeng How did you get the yaw of images?

jolinlinlin commented 5 years ago

@xiaoxingzeng Can you tell me how to get the yaw of images?Thank you very much.

lyatdawn commented 5 years ago

@jolinlinlin You can use this code for getting the yaw of images: https://www.learnopencv.com/?s=headpose

jolinlinlin commented 5 years ago

@lyatdawn I read the page you have mentioned and I find when I use opencv and dlib to detect 68 landmarks, if the num of detected face using dlib is 0, I can't get landmarks or calculate yaw angel. Can you tell me how to solve this problem?Thank you very much!

jolinlinlin commented 5 years ago

@lyatdawn I get the rotation vector by following https://www.learnopencv.com/?s=headpose as follows: dist_coeffs = np.zeros((4,1)) # Assuming no lens distortion (success, rotation_vector, translation_vector) = cv2.solvePnP(model_points, image_points, camera_matrix, dist_coeffs, flags=cv2.SOLVEPNP_ITERATIVE) rvec_matrix = cv2.Rodrigues(rotation_vector)[0] proj_matrix = np.hstack((rvec_matrix, translation_vector)) eulerAngles = cv2.decomposeProjectionMatrix(proj_matrix)[6] # yaw = eulerAngles[1] pitch = eulerAngles[0] roll = eulerAngles[2] But I don't whether the result is true,and I wonder that is there a resonable range of angle,e.g. -90°~+90°. I will appreciate it for your reply.

lyatdawn commented 5 years ago

@jolinlinlin 1) If detect face is failed, you can use other detection's method, like MTCNN to detect the face, then fusion the MTCNN's result and evaluate yaw codes. In my environment, I skip this image which its face is not detected directly. 2) the codes of yaw like this:

_, rotation_vector, translation_vector = cv2.solvePnP(model_points, 
    image_points, camera_matrix, dist_coeffs, flags=cv2.SOLVEPNP_ITERATIVE)

# print "Rotation Vector:\n {0}".format(rotation_vector)
# print "Translation Vector:\n {0}".format(translation_vector)
# Use cv::Rodrigues() to tansform rotation_vector to 3*3 rotation_matrix.
rotation_matrix, _ = cv2.Rodrigues(rotation_vector) 
# return a tuple(rotation_matrix, _)
# print "Rotation Matrix:\n {0}".format(rotation_matrix)

# Version2. https://blog.csdn.net/xuehuafeiwu123/article/details/74942989.
# Calculate the Euler angles according to rotation_matrix.
pitch = math.atan2(rotation_matrix[2, 1], rotation_matrix[2, 2])
yaw = math.atan2(-rotation_matrix[2, 0], math.sqrt(rotation_matrix[2, 1]**2 + 
    rotation_matrix[2, 2]**2))
roll = math.atan2(rotation_matrix[1, 0], rotation_matrix[0, 0])
# print("pitch angle:\n {0}".format(pitch))
yaw = yaw * 180.0 / 3.1415926 # The resultful yaw!

3) In my environment, I use the datasets originazed by: https://github.com/yxu0611/Tensorflow-implementation-of-LCNN, I use the 10K people of MS-Celeb-1M (Aligned) dataset.

jolinlinlin commented 5 years ago

@lyatdawn Thanks for your reply. 1.I find that face detection failure happens more in profile face,I will try to detect profile face by MTCNN. 2.I try to calculate with your code to calculate yaw angel.I take the cfp-dataset/Data/Images/002/profile/01.jpg for example,the result is as follows: pitch angle: -163.724121539 yaw angel: -42.4647411624 roll angle: -18.6841105951 I 'm not sure whether the angel must in range[-90°,90°] and I wonder whether calculate result is true,because face in this picture is absolutely profile and I think the yaw angle is close to 90°.I also find yaw angle in estimate_pose.txt which is calculated by author's method is barely closed to 90°even though the correspond picture looks like rotate 90° in yaw axis.

  1. The link of datasets in https://github.com/yxu0611/Tensorflow-implementation-of-LCNN is invalid.Do you have this aligned datasets and can you share with me ?Great thanks!
lyatdawn commented 5 years ago

@jolinlinlin

  1. The method https://www.learnopencv.com/?s=headpose, the 3D point of images is fixed, not calculated by a 3D face model. So the evaluated yaw is not accuracy, but, we can fix all the images' 3D facial point, then the 3D model can be think locating a same coordinate system. This is my understanding. If we can get the true 3D facial point of image, I think PnP method will be better. Or, we can use a general 3D face model to evaluate the 3D point of a face, this may be a good idea.
  2. You can search MS-Celeb-1M (Aligned) in the internet, this dataset is public.
jolinlinlin commented 5 years ago

@lyatdawn Thank you very much! You must be a good guy! I'm going to try 3D method as you say. Good luck to you!

jolinlinlin commented 5 years ago

@lyatdawn Sorry to bother you again. Do you know how to retrain the DREAM block after end2end training.I find the best performance in the paper comes from end2end+retrain but the author only public end2end model.

lyatdawn commented 5 years ago

@jolinlinlin I do not know the author's setting for "train the DREAM block separately with frontal-profile face pairs." In my article, I only use end2end training. I think, the DREAM block is only a tiny network, the process of training may be not difficult, we only nedd to prepare the "frontal-profile face pairs" datasets.

lyatdawn commented 5 years ago

@jolinlinlin And, what's your general 3D face model? Can you share your codes in here or my email lyatdawn@163.com? Thanks.

jolinlinlin commented 5 years ago

@lyatdawn I can't find general 3D face model which contains 68 3D points.I just use 6 3D points given in https://www.learnopencv.com/head-pose-estimation-using-opencv-and-dlib/. And I find more 3D points in https://github.com/mpatacchiola/deepgaze/blob/master/deepgaze/head_pose_estimation.py, but it seems different with learnopencv.com.

jolinlinlin commented 5 years ago

@lyatdawn In addition, Have you finished end2end training by using MS-Celeb-1M (Aligned) dataset?I train the end2end model by using CASIA-Webface dataset which contains 400000 images,but I find model convergence very slow and I don't know whether it's normal. My GPU is TITAN X with 12G video memory and top1 rise to around 10% after 7 epochs.Can you tell me your training situation and help me to evaluate my training situation?I would appreciate for your reply.

lyatdawn commented 5 years ago

@jolinlinlin I use MS-Celeb-1M (Aligned) dataset to train DREAM network, and, In my experiment, I will use 10K persons. I select 10K persons by https://github.com/yxu0611/Tensorflow-implementation-of-LCNN, in theis repo, we can "Download MS-Celeb-1M cleaned image_list 10K or 70K". In the experiment, I will split datasets(10K persons dataset) into training set and testing set, the result of testing set is:

lyatdawn commented 5 years ago

@jolinlinlin For the head estimating method of https://www.learnopencv.com/head-pose-estimation-using-opencv-and-dlib/, we only need 6 3D points, e.g. Nose tip, Chin, Left eye left corner, Right eye right corne, Left Mouth corner, Right mouth corner. If the 3D points of 3D face model are not the aforementioned points, we can detect the 2D landmarks mathing 3D face model.

jolinlinlin commented 5 years ago

@lyatdawn Thank you very much! What about your cost time while training. How many epochs you have trained to get this result? And the method of head estimating in https://www.learnopencv.com/head-pose-estimation-using-opencv-and-dlib/ need dlib to detect 68 2D landmarks,it takes too long time to process MS-Celeb-1M dataset. Do you have any method to speed up?

lyatdawn commented 5 years ago

@jolinlinlin The method of https://www.learnopencv.com/head-pose-estimation-using-opencv-and-dlib/ only need 6 2D/3D points, although we will detect 68 2D facial landmarks, we only use the following six points:

Nose tip, Chin, Left eye left corner, Right eye right corne, Left Mouth corner, Right mouth corner.

In my DREAM experiment, I will train 30 epoches, the other settings are nearly same with the authors. My GPU is GTX 1080TI, it may be poor to TITAN X.

lyatdawn commented 5 years ago

@jolinlinlin I will train 30 hours for 10K MS-Celeb-1M for 30 epoches.

jolinlinlin commented 5 years ago

@lyatdawn How many images in total in your 10K MS-Celeb-1M dataset? Do you have any pretrained model? Or you train from scratch?

lyatdawn commented 5 years ago

@jolinlinlin The number of training set is about 540 thousand, the testing set is about 60 thouand. I do not use any pretraind method, I just train the DREAM from initial model.

Wink-Xu commented 5 years ago

@lyatdawn @jolinlinlin Hi, guys. Do you find other ways to solve the questions of large pose face recognition?