Differences between code and paper

thorinf commented 1 year ago

Hi,

I've emailed Yang Song about differences between paper and code, but I thought I'd raise it as an issue so others can see. I'll update this issue if I get a reply via email.

There are some differences between the paper and the code, and I was hoping to know which is the better approach.

[Solved?] The Karras Rho scheduling is a bit different in the paper, where the code is the same as the EDM implementation. This one I think is explainable since they are just the reverse of each other - this may be why tn and tn+1 are switched in the code.
The input for the denoiser is scaled in the code, but not in the paper i.e. c_in.
Time rescaling is multiplied by a factor of 1000. My first thoughts on this is that it may be because the Temporal Embedding in the model prefers larger floats, e.g. Sinusoidal PE - but this is unconfirmed.

Any advice would be really appreciated, thank you.

sharkDDD commented 1 year ago

Hi,

I've emailed Yang Song about differences between paper and code, but I thought I'd raise it as an issue so others can see. I'll update this issue if I get a reply via email.

There are some differences between the paper and the code, and I was hoping to know which is the better approach.

[Solved?] The Karras Rho scheduling is a bit different in the paper, where the code is the same as the EDM implementation. This one I think is explainable since they are just the reverse of each other - this may be why tn and tn+1 are switched in the code.

The input for the denoiser is scaled in the code, but not in the paper i.e. c_in.

Time rescaling is multiplied by a factor of 1000. My first thoughts on this is that it may be because the Temporal Embedding in the model prefers larger floats, e.g. Sinusoidal PE - but this is completely unconfirmed.

Any advice would be really appreciated, thank you.

I found some differences between paper and code as follows: 1: The function (6) in the paper means that the x_tn is calculated by x_t_n+1, but the implementation in karras_diffusion.py are not, exchange the x_t2 and x_t may be correct. 2: Also in karras_diffusion.py, the euler_solver does not utilize the score function which is mentioned in the paper.

yuanzhi-zhu commented 1 year ago

Hi @sharkDDD The first question is mentioned in https://github.com/openai/consistency_models/issues/12#issue-1670110771 Indeed, the t schedule is calculated in a reversed order (https://github.com/openai/consistency_models/blob/main/cm/karras_diffusion.py#L178) compared to the paper, which makes it work out correctly in the end. For question 2, you need to know the relation between score function and a denoiser (e.g https://twitter.com/iScienceLuvr/status/1592860080151891969) best,

openai / consistency_models

Differences between code and paper #18