tencent-ailab / V-Express

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.
2.03k stars 250 forks source link

the value of reference_attention_weight and audio_attention_weight in training? #12

Closed saeedfirouzi closed 1 month ago

saeedfirouzi commented 1 month ago

thanks for your great work. what is the value of reference_attention_weight and audio_attention_weight in training? is it a static parameter or it is a random number in a specific range?

Tenvence commented 1 month ago

The parameters reference_attention_weight and audio_attention_weight are only considered during inference.

For training, we disregard the variation of these parameters and set both to 1.0.

saeedfirouzi commented 1 month ago

so can you explain about this method of "progressive drop operations" : a simple method that balances different control signals through a series of progressive drop operations.

saeedfirouzi commented 1 month ago

yeah, after reading the article I found the answer.