sstzal / DiffTalk

[CVPR2023] The implementation for "DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation"
441 stars 41 forks source link

Can you provide the processed data or the related processing code? #7

Open Haoqing-Wang opened 1 year ago

Haoqing-Wang commented 1 year ago

Nice job! Can you released the used dataset?

HUAFOR commented 1 year ago

+1

DavidKong96 commented 1 year ago

hope to know which face det model used..

lmpeng12 commented 1 year ago

+1

MengShen0709 commented 1 year ago

+1

MengShen0709 commented 1 year ago

@DavidKong96 I guess authors might use dlib to extract the landmarks

The partial landmarks are defined in their dataloader:

        landmarks_img = landmarks[13:48]
        landmarks_img2 = landmarks[0:4]
        landmarks_img = np.concatenate((landmarks_img2, landmarks_img))

I use dlib to extracts 68 points, and follow the authors to only keep some of them. It turns out, by doing so, we could remove bottom half of landmarks. This landmark masking is in accordance with the image masking for the model to learn how to generate lip movement.

Here is an illustration: landmark

But I haven't reproduced the results, so it is just my guess.

Feel free to discuss.

Haoqing-Wang commented 1 year ago

before obtain landmark, we need to detect the facial RoI advance. But when the model can not detete the face, how to obtain the landmark? we use dlib to obtain landmark.

detector = dlib.get_frontal_face_detector() predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat") gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) rect = detector(gray)[0] shape = predictor(gray, rect)

rect could be None.

jianmanLin commented 1 year ago

@DavidKong96 I guess authors might use dlib to extract the landmarks

The partial landmarks are defined in their dataloader:

        landmarks_img = landmarks[13:48]
        landmarks_img2 = landmarks[0:4]
        landmarks_img = np.concatenate((landmarks_img2, landmarks_img))

I use dlib to extracts 68 points, and follow the authors to only keep some of them. It turns out, by doing so, we could remove bottom half of landmarks. This landmark masking is in accordance with the image masking for the model to learn how to generate lip movement.

Here is an illustration: landmark

But I haven't reproduced the results, so it is just my guess.

Feel free to discuss.

Hello, did you successfully reproduce this paper? As a result of my training, the inpainting area will keep shaking, and then the training loss will drop rapidly at the beginning, and then it will shake within a small area. I feel very happy troubled

jianmanLin commented 1 year ago

before obtain landmark, we need to detect the facial RoI advance. But when the model can not detete the face, how to obtain the landmark? we use dlib to obtain landmark.

detector = dlib.get_frontal_face_detector() predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat") gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) rect = detector(gray)[0] shape = predictor(gray, rect)

rect could be None.

Hello, did you successfully reproduce this paper? As a result of my training, the inpainting area will keep shaking, and then the training loss will drop rapidly at the beginning, and then it will shake within a small area. I feel very happy troubled

Utkarsh-shift commented 8 months ago

same request