sstzal / DiffTalk

[CVPR2023] The implementation for "DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation"
419 stars 41 forks source link

Inference question #15

Open Bebaam opened 12 months ago

Bebaam commented 12 months ago

When running inference, I only get an incomplete image with landmarks and mask. What do I need to do in order to get a clean image? 0000_0000

jianmanLin commented 12 months ago

I also encountered this problem. This is because the model parameters given by the author only include encoder-decoder. The complete model is too large. I saved 8.2G after training.

Bebaam commented 12 months ago

okay that is unfortunate, thank you for the insight.

rjc7011855 commented 11 months ago

Hello, may I ask how the signal features of your audio are extracted