Closed luguansong closed 2 years ago
Hi,
I would like to first thank you for open-sourcing your code for the community.
During using the code for fp16 (or amp) training, I found something confusing at https://github.com/openai/guided-diffusion/blob/main/guided_diffusion/fp16_util.py#L202 . Why do you only scale the gradient for scalar parameters? what about the gradient of matrix parameters?
Sincerely looking forward to your reply.
This actually does look like a bug. I don't know if this was a bug introduced when porting our code over to this public repo, or if it was a bug that impacted our own experiments. Will have to look into it.
Hi,
I would like to first thank you for open-sourcing your code for the community.
During using the code for fp16 (or amp) training, I found something confusing at https://github.com/openai/guided-diffusion/blob/main/guided_diffusion/fp16_util.py#L202 . Why do you only scale the gradient for scalar parameters? what about the gradient of matrix parameters?
Sincerely looking forward to your reply.