Hello, thank you for sharing this excellent work. After briefly browsing the code, I have two questions:
(1) What is the use of x_ref ? During training it seems to be a different fragment of the same mel-spectrogram as x. And to which part of the paper does it correspond?
(2) Why do we need to perform a weighted summation of mean and x? Does this mean that the reverse diffusion during inference starts from the weighted mean_x?
I'm new to diffusion models and don't quite understand the theory in the paper, so sorry if I asked some stupid questions.
Hello, thank you for sharing this excellent work. After briefly browsing the code, I have two questions: (1) What is the use of
x_ref
? During training it seems to be a different fragment of the same mel-spectrogram asx
. And to which part of the paper does it correspond? (2) Why do we need to perform a weighted summation ofmean
andx
? Does this mean that the reverse diffusion during inference starts from the weightedmean_x
? I'm new to diffusion models and don't quite understand the theory in the paper, so sorry if I asked some stupid questions.