Closed chenerg closed 1 year ago
Thank you for your attention. We released five audio and poses extracted from videos in our demo. And we have uploaded the test set of MEAD, which has more audio data. The new audio needs a preprocess with the deep speech model in AD-NeRF. We will make the audio pre-process code public in two or three days. As for the pose, we control it with GT (which has the same number of frames) or other videos (Need to align or pad pose). Do you need to control the pose? This may take more time to release. :blush:
Thanks for your reply. So, if I want to animate a new portrait, it seems that I have to estimate at least the pose of the given portrait. Then I can animate the portrait with a static pose (of the given portrait) or with dynamic poses (estimated from another video). I want to try both, so the pose estimation script seems essential for my need.
If you just want to animate new portraits with the same audio and pose sequences, just place them in ./demo/imgs/
or place the cropped images (256x256) in ./demo/imgs_cropped/
. The latent keypoints of source images can be extracted automatically in our code. I have updated the README.md. The pose sequences are coming from driven videos and we are cleaning the preprocess code of this part.
I want to animate an image with an audio, how should I do? Appreciate your great work.