yzhou359 / MakeItTalk

Other
946 stars 215 forks source link

training data extract landmark use face_alignment.LandmarksType._3D or face_alignment.LandmarksType._2D #5

Closed sicilyliu closed 3 years ago

sicilyliu commented 3 years ago

hi,thanks for your shares ,the training data extract landmarks, use face_alignment.LandmarksType._3D or face_alignment.LandmarksType._2D

yzhou359 commented 3 years ago

We use 3D.

sicilyliu commented 3 years ago

but in Av2Flau_Convertor.py , i see you use self.predictor = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, device='cuda', flip_input=True),then you concatenate 1 to the three dim, shape_3d = np.concatenate([shape_3d, np.ones(shape=(68, 1))], axis=1),if this ok ?

sicilyliu commented 3 years ago

We use 3D.

but in Av2Flau_Convertor.py , i see you use self.predictor = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, device='cuda', flip_input=True),then you concatenate 1 to the three dim, shape_3d = np.concatenate([shape_3d, np.ones(shape=(68, 1))], axis=1),if this ok ?

yzhou359 commented 3 years ago

Thanks for your questions. Let me make this clear. My 3D answer was meant only for the previous audio-to-landmark branches. The part of code you mentioned where we prepare the dataset hasn't been fully cleaned yet. If you are talking about the Av2Flau_Convertor.py used for image2image translation branch training, we have 2 alternatives using either 3D or 2D. For 2D, we append an all-ones array as the third column to the extracted landmarks. Both alternative has its own advantages, for different tasks, we interchangably use different models trained on either 2D and 3D landmarks.

sicilyliu commented 3 years ago

Thanks for your questions. Let me make this clear. My 3D answer was meant only for the previous audio-to-landmark branches. The part of code you mentioned where we prepare the dataset hasn't been fully cleaned yet. If you are talking about the Av2Flau_Convertor.py used for image2image translation branch training, we have 2 alternatives using either 3D or 2D. For 2D, we append an all-ones array as the third column to the extracted landmarks. Both alternative has its own advantages, for different tasks, we interchangably use different models trained on either 2D and 3D landmarks.

  • 3D landmarks have a seamless connect to the previous branches output and can have good 3D estimation when head has frequent rotation.
  • 2D landmarks provide a clean edge for the face when the head is on an extreme side view which helps to synthesize better side view faces. But it may bring in distorted jaw artifacts. You can choose either 2D or 3D depends on your own task.

thanks,you are so nice,i got it!

sicilyliu commented 3 years ago

Thanks for your questions. Let me make this clear. My 3D answer was meant only for the previous audio-to-landmark branches. The part of code you mentioned where we prepare the dataset hasn't been fully cleaned yet. If you are talking about the Av2Flau_Convertor.py used for image2image translation branch training, we have 2 alternatives using either 3D or 2D. For 2D, we append an all-ones array as the third column to the extracted landmarks. Both alternative has its own advantages, for different tasks, we interchangably use different models trained on either 2D and 3D landmarks.

  • 3D landmarks have a seamless connect to the previous branches output and can have good 3D estimation when head has frequent rotation.
  • 2D landmarks provide a clean edge for the face when the head is on an extreme side view which helps to synthesize better side view faces. But it may bring in distorted jaw artifacts. You can choose either 2D or 3D depends on your own task.

Thanks for your questions. Let me make this clear. My 3D answer was meant only for the previous audio-to-landmark branches. The part of code you mentioned where we prepare the dataset hasn't been fully cleaned yet. If you are talking about the Av2Flau_Convertor.py used for image2image translation branch training, we have 2 alternatives using either 3D or 2D. For 2D, we append an all-ones array as the third column to the extracted landmarks. Both alternative has its own advantages, for different tasks, we interchangably use different models trained on either 2D and 3D landmarks.

  • 3D landmarks have a seamless connect to the previous branches output and can have good 3D estimation when head has frequent rotation.
  • 2D landmarks provide a clean edge for the face when the head is on an extreme side view which helps to synthesize better side view faces. But it may bring in distorted jaw artifacts. You can choose either 2D or 3D depends on your own task.

hi, zhou,i have another question ,hope you can answer, in audio-to-landmark branches(train content), i extract 3D landmarks with : predictor = face_alignment.FaceAlignment(face_alignment.LandmarksType._3D, device='cuda', flip_input=True) shapes = predictor.get_landmarks(img) then normalize it: shape_3d, scale, shift = util.norm_input_face(shape_3d) epoch=1001, but the result is not good like yours,sometimes lip get off the lower jaw

yzhou359 commented 3 years ago

Do you mean by the shapes extracted from the predictor is not good enough? It happens for some images if the frame is not clear since we're replying on the off-the-shelf landmark alignment, i.e. FANet.

sicilyliu commented 3 years ago

Do you mean by the shapes extracted from the predictor is not good enough? It happens for some images if the frame is not clear since we're replying on the off-the-shelf landmark alignment, i.e. FANet.

i mean, how do you extract landmarks when training the audio-to-landmark branches ? i extract landmarks like the above ways when trainging the audio-to-landmark branches, i replace your ckpt_content_branch.pth with my.pth, then running the main_end2end.py but the result is not good, i am not sure whether i extract the landmarks right.

yzhou359 commented 3 years ago

You can check whether the original landmarks extracted by the FANet are correct or not. Is the motion of the lips correct?

luantunez commented 3 years ago

This discussion is really interesting! I was just wondering, would 3D landmarks help construct a 3D avatar from an image? Is there a way to implement that with the pretrained models? Thank you!