gradient scaling in fp16 training

openai / guided-diffusion

MIT License

6.06k stars 807 forks source link

gradient scaling in fp16 training #44

Closed luguansong closed 2 years ago

luguansong commented 2 years ago

Hi,

I would like to first thank you for open-sourcing your code for the community.

During using the code for fp16 (or amp) training, I found something confusing at https://github.com/openai/guided-diffusion/blob/main/guided_diffusion/fp16_util.py#L202 . Why do you only scale the gradient for scalar parameters? what about the gradient of matrix parameters?

Sincerely looking forward to your reply.

unixpickle commented 2 years ago

This actually does look like a bug. I don't know if this was a bug introduced when porting our code over to this public repo, or if it was a bug that impacted our own experiments. Will have to look into it.