yuangan / EAT_code

Official code for ICCV 2023 paper: "Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation".
Other
269 stars 30 forks source link

Generated Video Background Is Not Static #27

Closed hrWong closed 5 months ago

hrWong commented 5 months ago

Thank you for your excellent work and I've noticed that the backgrounds in the generated videos are not static; they seem to move or change over time. I was expecting the background to be a fixed image or scene.

Could you please provide some insight into this behavior? Is it an intended feature , or is it possibly a bug or an issue with my usage or settings?

Any information or guidance you can provide would be greatly appreciated. If there's a way to make the background static, please let me know how to adjust the settings or if there's a workaround.

Thank you for your time and assistance.

Best regards,

yuangan commented 5 months ago

Thank you for your attention.

In my opinion, this is one limitation of EAT. When using RePos-Net to generate talking heads and backgrounds, controlling only the head with latent key-points proves challenging. Here are some potential solutions to address this issue:

  1. Use a segmentation network, like ModNet, to isolate the talking head and then integrate it with the background.
  2. Fine-tune the OSFV network to ensure that the latent keypoints influence only the head. This may seem difficult but has been shown in Nvidia's demo.
  3. Create the talking head without the background and combine them afterward.
  4. Incorporate emotional expressions into other projects, such as Real3DPortrait and so on. They generate the head and background separately.
hrWong commented 5 months ago

Thank you for your attention.

In my opinion, this is one limitation of EAT. When using RePos-Net to generate talking heads and backgrounds, controlling only the head with latent key-points proves challenging. Here are some potential solutions to address this issue:

  1. Use a segmentation network, like ModNet, to isolate the talking head and then integrate it with the background.
  2. Fine-tune the OSFV network to ensure that the latent keypoints influence only the head. This may seem difficult but has been shown in Nvidia's demo.
  3. Create the talking head without the background and combine them afterward.
  4. Incorporate emotional expressions into other projects, such as Real3DPortrait and so on. They generate the head and background separately.

Thank you for your prompt response. I will try out the suggestions you provided.

yuangan commented 5 months ago

Good Luck~