huawei-noah / Speech-Backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
545 stars 113 forks source link

Two questions about DiffVC #31

Open huangf79 opened 10 months ago

huangf79 commented 10 months ago

Hello, thank you for sharing this excellent work. After briefly browsing the code, I have two questions: (1) What is the use of x_ref ? During training it seems to be a different fragment of the same mel-spectrogram as x. And to which part of the paper does it correspond? (2) Why do we need to perform a weighted summation of mean and x? Does this mean that the reverse diffusion during inference starts from the weighted mean_x? I'm new to diffusion models and don't quite understand the theory in the paper, so sorry if I asked some stupid questions.