tencent-ailab / V-Express

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.
2.26k stars 283 forks source link

the value of reference_attention_weight and audio_attention_weight in training? #12

Closed saeedfirouzi closed 5 months ago

saeedfirouzi commented 5 months ago

thanks for your great work. what is the value of reference_attention_weight and audio_attention_weight in training? is it a static parameter or it is a random number in a specific range?

Tenvence commented 5 months ago

The parameters reference_attention_weight and audio_attention_weight are only considered during inference.

For training, we disregard the variation of these parameters and set both to 1.0.

saeedfirouzi commented 5 months ago

so can you explain about this method of "progressive drop operations" : a simple method that balances different control signals through a series of progressive drop operations.

saeedfirouzi commented 5 months ago

yeah, after reading the article I found the answer.