Why do ema models not need gradients only when there are pre-trained weights?

willisma / SiT

Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"

https://scalable-interpolant.github.io/

MIT License

662 stars 35 forks source link

Closed LinB203 closed 8 months ago

LinB203 commented 8 months ago

Why does ema model not need gradients only when loading pre-trained weights?

DiT's ema model always does not need gradients.

willisma commented 8 months ago

Hi,

Thank you for spotting the issue, the ema model indeed does not require grad at all time. The code has been updated accordingly.