Open yinguanchun opened 1 year ago
@yinguanchun I am also confused about this scaling factor, have you understood that?
I am also confused about this scaling factor, have you understood that?
In my opinion, authors define L_{vlb} = L_0 + ... + L_T, not L_t. Thus, they may calculate the vlb loss with scale factor T (self.num_timestep).
@yhy258 Thank you for your answer, so, which means we use L_t * T (self. numtimestep) to approximate L {vlb}?
In the paper, λ is 0.001. The code sets learn_sigma as True and rescale_learned_sigmas as False, so the loss type will be gd.LossType.MSE, in this loss type ,the Lvlb will not multply 0.001. Even if the loss type is gd.LossType.RESCALED_MSE, terms["vb"] *= self.num_timesteps / 1000.0, what is self.num_timesteps, and what is its effect? Thank you .