Closed sunshineatnoon closed 1 year ago
Hi, please check the Chapter 4.2 for the detail of Multi-View | Cross-Identity Reenactment.
Source
column in Fig.4 and Fig.8 to provide the identity feature. For the Cross-Identity Reenactment, The WRA_EricCantor_000
is used for the driving signal.When I was conducting the evaluation. I did not find any resource to compute CSIM, AED, APD and AKD, so I refered to previous method and made it on my own. Hope you can refer to StyleHEAT, IMavatar and PIRender to check what they did in their papers. If you meet some problem here, please let me know. My next releasing of the first priority would be the pre-processing script.
Thanks for the prompt and detailed response!
Hope you don't mind I have some follow-up questions. Do you also plan to release the processed testing data for Multiface? And what views did you use for the multi-view evaluation?
BTW, HeadNeRF takes a very long time to run, did you test on a subset of the frames or all of them?
As the Multiface dataset is too large, we only test our model on two subjects: 002643814 (male) and 5067077 (female). For 002643814, the source frames are the first frames of the video recorded by camera number 400015; for 5067077, the source frames are from camera number 400356.
Ideally, we would like to run the results for the camera poses from -90 degrees to 90 degrees. But unfortunately, the animation results grow worse when the angle grows large (e.x. ±45 degrees). You may notice what is the problem from the EG3D demo, the background is treated as another part of the foreground. This is a common problem for triplane-based methods unless manually handled e.x. in PanoHead. So we just select certain camera poses for the evaluation.
If you want to run your own experiments, you can choose however large camera poses you like. The only requirement I suppose is that all models are examined on the same configuration.
As for the low-preprocessing speed in HeadNeRF, that is the consequence of getting an accruate camera poses :) . I tested it on all of the video frames.
Okay. Thanks for the explanation.
Hi, @theEricMa Hope you don't mind me reopen this issue. I have one more question about AKD, did you use all key points or a subset of key points to compute AKD? I cannot find relevant works.
We use all the keypoints extracted using the face_alignment.FaceAlignment(face_alignment.LandmarksType._2D)
detector from face_alignment to calculate AKD.
@theEricMa Thanks for the reply! I have another question about the FID score for cross-reenactment. What images are used as real images in this case since we only have one drive video?
For the FID score, you don't need ground truth, but some videos in the same data distribution. We use the driving video to represent the real distribution. My evaluation code is in the following scheme: for each subject, extract all frames from both the driving frames and synthesized frames, calculate one FID score; the final FID score is the averaged per-video FID score among all the synthesized subjects.
@theEricMa Could you share the code about FID、AKD、AED、CSIM、APD?I check repos like PIRenderer、FOMM and StyleHeat, but only FOMM share the code about AED and AKD, which maybe right. So we wish you sincerely release your codes about criteria on cross-identity reenactment.
@theEricMa, could you please share the evaluation code? That would be greatly appreciated.
The script for arbitrary images is similar to the inference code in inference_refine_1D_cam.py, but it will contain additional code for data parameters. This includes extracting the head pose and facial expression coefficients using tools from the OTAvatar_processing repository.
For the FID score, you don't need ground truth, but some videos in the same data distribution. We use the driving video to represent the real distribution. My evaluation code is in the following scheme: for each subject, extract all frames from both the driving frames and synthesized frames, calculate one FID score; the final FID score is the averaged per-video FID score among all the synthesized subjects.
Sorry, but when cross-id, such as id A video as driving video, and id B offer source image, then generating the A-B video, how to calculate the fid? is the A with the A-B or the B(B's ground truth video) with A-B?
Hi, Thanks for open-sourcing this awesome work. Could you please let me know how to get the numbers in Table 1 in the paper? I couldn't find details about Multi-View Reenactment and Cross-Identity Reenactment.
Specifically,
WRA_EricCantor_000
video (as here) to drive the first frame of each test video?Also, do you have any plan to release the script for CSIM, AED, APD and AKD computation, or could you please point me to the external code you used for these metrics.
Thanks in advance!