harubaru / waifu-diffusion

stable diffusion finetuned on weeb stuff
GNU Affero General Public License v3.0
1.94k stars 177 forks source link

clip_grad_norm applied to scaled gradients #64

Open fpgaminer opened 1 year ago

fpgaminer commented 1 year ago

On this line, grad clipping occurs:

https://github.com/harubaru/waifu-diffusion/blob/27d301c5b96834536166cc2f12e7a9bb4079fb96/trainer/diffusers_trainer.py#L931

However, if fp16 is enabled then the clipping would be applied to the scaled gradients, due to GradScaler.

According to PyTorch documentation (https://pytorch.org/docs/master/notes/amp_examples.html#gradient-clipping), the gradients should be unscaled before clipping.

So, this appears to be a bug and could cause fp16 training to result in worse performance than it otherwise should.