theEricMa / OTAvatar

This is the official repository for OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering [CVPR2023].
306 stars 37 forks source link

Quantitative Evaluation #11

Closed sunshineatnoon closed 1 year ago

sunshineatnoon commented 1 year ago

Hi, Thanks for open-sourcing this awesome work. Could you please let me know how to get the numbers in Table 1 in the paper? I couldn't find details about Multi-View Reenactment and Cross-Identity Reenactment.

Specifically,

Also, do you have any plan to release the script for CSIM, AED, APD and AKD computation, or could you please point me to the external code you used for these metrics.

Thanks in advance!

theEricMa commented 1 year ago

Hi, please check the Chapter 4.2 for the detail of Multi-View | Cross-Identity Reenactment.

When I was conducting the evaluation. I did not find any resource to compute CSIM, AED, APD and AKD, so I refered to previous method and made it on my own. Hope you can refer to StyleHEAT, IMavatar and PIRender to check what they did in their papers. If you meet some problem here, please let me know. My next releasing of the first priority would be the pre-processing script.

sunshineatnoon commented 1 year ago

Thanks for the prompt and detailed response!

Hope you don't mind I have some follow-up questions. Do you also plan to release the processed testing data for Multiface? And what views did you use for the multi-view evaluation?

BTW, HeadNeRF takes a very long time to run, did you test on a subset of the frames or all of them?

theEricMa commented 1 year ago

As the Multiface dataset is too large, we only test our model on two subjects: 002643814 (male) and 5067077 (female). For 002643814, the source frames are the first frames of the video recorded by camera number 400015; for 5067077, the source frames are from camera number 400356.

Ideally, we would like to run the results for the camera poses from -90 degrees to 90 degrees. But unfortunately, the animation results grow worse when the angle grows large (e.x. ±45 degrees). You may notice what is the problem from the EG3D demo, the background is treated as another part of the foreground. This is a common problem for triplane-based methods unless manually handled e.x. in PanoHead. So we just select certain camera poses for the evaluation.

If you want to run your own experiments, you can choose however large camera poses you like. The only requirement I suppose is that all models are examined on the same configuration.

As for the low-preprocessing speed in HeadNeRF, that is the consequence of getting an accruate camera poses :) . I tested it on all of the video frames.

sunshineatnoon commented 1 year ago

Okay. Thanks for the explanation.

sunshineatnoon commented 1 year ago

Hi, @theEricMa Hope you don't mind me reopen this issue. I have one more question about AKD, did you use all key points or a subset of key points to compute AKD? I cannot find relevant works.

theEricMa commented 1 year ago

We use all the keypoints extracted using the face_alignment.FaceAlignment(face_alignment.LandmarksType._2D) detector from face_alignment to calculate AKD.

sunshineatnoon commented 1 year ago

@theEricMa Thanks for the reply! I have another question about the FID score for cross-reenactment. What images are used as real images in this case since we only have one drive video?

theEricMa commented 1 year ago

For the FID score, you don't need ground truth, but some videos in the same data distribution. We use the driving video to represent the real distribution. My evaluation code is in the following scheme: for each subject, extract all frames from both the driving frames and synthesized frames, calculate one FID score; the final FID score is the averaged per-video FID score among all the synthesized subjects.

Harold0530-zhang commented 9 months ago

@theEricMa Could you share the code about FID、AKD、AED、CSIM、APD?I check repos like PIRenderer、FOMM and StyleHeat, but only FOMM share the code about AED and AKD, which maybe right. So we wish you sincerely release your codes about criteria on cross-identity reenactment.

szh-bash commented 6 months ago

@theEricMa, could you please share the evaluation code? That would be greatly appreciated.

theEricMa commented 6 months ago

The script for arbitrary images is similar to the inference code in inference_refine_1D_cam.py, but it will contain additional code for data parameters. This includes extracting the head pose and facial expression coefficients using tools from the OTAvatar_processing repository.

Un1Lee commented 5 months ago

For the FID score, you don't need ground truth, but some videos in the same data distribution. We use the driving video to represent the real distribution. My evaluation code is in the following scheme: for each subject, extract all frames from both the driving frames and synthesized frames, calculate one FID score; the final FID score is the averaged per-video FID score among all the synthesized subjects.

Sorry, but when cross-id, such as id A video as driving video, and id B offer source image, then generating the A-B video, how to calculate the fid? is the A with the A-B or the B(B's ground truth video) with A-B?