Open thorinf opened 1 year ago
Hi,
I've emailed Yang Song about differences between paper and code, but I thought I'd raise it as an issue so others can see. I'll update this issue if I get a reply via email.
There are some differences between the paper and the code, and I was hoping to know which is the better approach.
- [Solved?] The Karras Rho scheduling is a bit different in the paper, where the code is the same as the EDM implementation. This one I think is explainable since they are just the reverse of each other - this may be why tn and tn+1 are switched in the code.
- The input for the denoiser is scaled in the code, but not in the paper i.e.
c_in
.- Time rescaling is multiplied by a factor of 1000. My first thoughts on this is that it may be because the Temporal Embedding in the model prefers larger floats, e.g. Sinusoidal PE - but this is completely unconfirmed.
Any advice would be really appreciated, thank you.
I found some differences between paper and code as follows: 1: The function (6) in the paper means that the x_tn is calculated by x_t_n+1, but the implementation in karras_diffusion.py are not, exchange the x_t2 and x_t may be correct. 2: Also in karras_diffusion.py, the euler_solver does not utilize the score function which is mentioned in the paper.
Hi @sharkDDD The first question is mentioned in https://github.com/openai/consistency_models/issues/12#issue-1670110771 Indeed, the t schedule is calculated in a reversed order (https://github.com/openai/consistency_models/blob/main/cm/karras_diffusion.py#L178) compared to the paper, which makes it work out correctly in the end. For question 2, you need to know the relation between score function and a denoiser (e.g https://twitter.com/iScienceLuvr/status/1592860080151891969) best,
Hi,
I've emailed Yang Song about differences between paper and code, but I thought I'd raise it as an issue so others can see. I'll update this issue if I get a reply via email.
There are some differences between the paper and the code, and I was hoping to know which is the better approach.
c_in
.Any advice would be really appreciated, thank you.