Open realfolkcode opened 1 week ago
Great points. So how does it influence the generation performance specifically?
@xmhGit I don't really know as I haven't conducted any experiments (no resources for training, sadly). But my guess would be that it wouldn't really impact the performance significant;y either way. The reason is timestep embeddings are actually redundant, as the model is likely to learn the signal-to-noise ratio from the image itself (how pixels are correlated to each other, etc) without relying on the time conditioning. I am really fascinated by this ability.
@realfolkcode Thank you for your detailed and insightful reply! I really appreciate it and learned a lot.
Hi! First, I would like to thank the authors for this awesome paper! Stochastic interpolants deserve to gain more attention :)
My question is related to the implementation of timestep embeddings. In SiT, the default time horizon is [0, 1], which is different from discrete timesteps {1, 2, 3, ...} in DDPM-like models. As in DiT, the timestep embeddings are first encoded via fixed sinusoidal functions. The implementation is derived from the GLIDE repository (which is DDPM-like) and basically remains intact. However, it seems like the quality of such embeddings might be subpar due to the inadequate default value of
max_period
parameter (10000). When $t$ is bounded, high values ofmax_period
result in poor coverage of the spectrum. I thinkmax_period
should be carefully adjusted in case of the continuous timesteps. Here is the colab notebook that visualizes the issue.