Closed G-1nOnly closed 1 year ago
hi,how about the video length? It may need 3-5 minutes to learn a good tenderer. By the way, a learning curve about your head_nerf may help me find out the problem.
Thanks, the video length is over 4 minutes, and the learning curve is provided as follows.
Thanks in advance!
Hi, it seems that the adversarial training of postnet has failed (the confidence of discriminator to the generate sample is only around 0.3, which ideally should be 0.5). I have two suggestions:
1) try lm3d_vae_sync_pitch and lm3d_postnet_sync_pitch. Because we found pitch is a useful hint for the postnet to successfully perform the domain adaptation.
2) reduce the number of layers in discriminator, which may ease the adversarial training.
Thanks, I would try to modify it to see if it works.
Hi, I notice that you closed this issuse as completed. Can you feedback is you problem solved? And which method helps you out?
Yes, the results improve by applying the first method but I'm still trying the second one to see if it makes it better, thanks for the advice.
Hello ! Thanks for such a great work and congratulations on being accepted by ICLR2023.
I've run your code for a customized dataset and found out that the lip does not match the audio as well as the demo video. I have visualized the landmark and found out that the landmark matches the audio, so I think it is a renderer issue. I'm currently using the radnerf renderer, do you have any suggestions on solving this issue?
Thanks in advance!