willisma / SiT

Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
https://scalable-interpolant.github.io/
MIT License
662 stars 35 forks source link

Why do ema models not need gradients only when there are pre-trained weights? #6

Closed LinB203 closed 8 months ago

LinB203 commented 8 months ago

Why does ema model not need gradients only when loading pre-trained weights?

https://github.com/willisma/SiT/blob/main/train.py#L165

DiT's ema model always does not need gradients.

https://github.com/facebookresearch/DiT/blob/main/train.py#L148

willisma commented 8 months ago

Hi,

Thank you for spotting the issue, the ema model indeed does not require grad at all time. The code has been updated accordingly.