willisma / SiT

Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
https://scalable-interpolant.github.io/
MIT License
662 stars 35 forks source link

Regarding timestep embeddings #24

Open realfolkcode opened 1 week ago

realfolkcode commented 1 week ago

Hi! First, I would like to thank the authors for this awesome paper! Stochastic interpolants deserve to gain more attention :)

My question is related to the implementation of timestep embeddings. In SiT, the default time horizon is [0, 1], which is different from discrete timesteps {1, 2, 3, ...} in DDPM-like models. As in DiT, the timestep embeddings are first encoded via fixed sinusoidal functions. The implementation is derived from the GLIDE repository (which is DDPM-like) and basically remains intact. However, it seems like the quality of such embeddings might be subpar due to the inadequate default value of max_period parameter (10000). When $t$ is bounded, high values of max_period result in poor coverage of the spectrum. I think max_period should be carefully adjusted in case of the continuous timesteps. Here is the colab notebook that visualizes the issue.

xmhGit commented 1 week ago

Great points. So how does it influence the generation performance specifically?

realfolkcode commented 6 days ago

@xmhGit I don't really know as I haven't conducted any experiments (no resources for training, sadly). But my guess would be that it wouldn't really impact the performance significant;y either way. The reason is timestep embeddings are actually redundant, as the model is likely to learn the signal-to-noise ratio from the image itself (how pixels are correlated to each other, etc) without relying on the time conditioning. I am really fascinated by this ability.

xmhGit commented 6 days ago

@realfolkcode Thank you for your detailed and insightful reply! I really appreciate it and learned a lot.