tencent-ailab / V-Express

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.
2.03k stars 250 forks source link

Question on training strategy: Is there audio conditional drop out? #45

Open fredkingdom opened 1 week ago

fredkingdom commented 1 week ago

Thanks for the open source! I've noticed that in v_express_pipeline.py, you use classifier free guidance to audio embeddings, however, the technique report doesn't seem to mention the audio embedding dropout. I'm wondering if you drop the audio embeddings during training, and what's the dropout rate?

tiankuan93 commented 1 week ago

For the classifier-free guidance strategy, we drop all conditions during training with a drop rate of 10%. Whereas strategies that drop strong conditions (e.g., drop kps, etc.) and the classifier-free guidance strategy are independent of each other, the technical report refers to related settings for drop strong conditions.