Closed santodante closed 2 years ago
First of all, thanks for an extraordinary paper - so many interesting details!! Also, thanks for open sourcing the code.
I have a few ideas I want to test and I'm trying to understand all the parts of the code. Most of it is clear and well commented, but I can't seem to figure out the reasoning behind the 'zero_module' you have in a few places in the guided-diffusion/guided_diffusion/unet.py file?
def zero_module(module): """ Zero out the parameters of a module and return it. """ for p in module.parameters(): p.detach().zero_() return module
I couldn't find anything in the paper or online to explain why this is used.
I'm also curious why you used a custom mixed precision training instead of using PyTorch's mixed precision training (torch.cuda.amp.autocast)?
In my opinion, this may be a trick to do weight initialization, since I find this module is often used before the skip connection.
without .detach
, you will get an error:
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
The zero_grad() function is used to initialize certain modules to zero. I believe this initialization scheme was also used in Denoising Diffusion Probabilistic Models (2020)
paper (e.g. here).
torch.cuda.amp
wasn't in a stable state when we started working on this project, but it is likely suitable to be used in a project like this by now. We don't want to change the code more than necessary since this is mostly an archive research codebase.
First of all, thanks for an extraordinary paper - so many interesting details!! Also, thanks for open sourcing the code.
I have a few ideas I want to test and I'm trying to understand all the parts of the code. Most of it is clear and well commented, but I can't seem to figure out the reasoning behind the 'zero_module' you have in a few places in the guided-diffusion/guided_diffusion/unet.py file?
def zero_module(module): """ Zero out the parameters of a module and return it. """ for p in module.parameters(): p.detach().zero_() return module
I couldn't find anything in the paper or online to explain why this is used.
I'm also curious why you used a custom mixed precision training instead of using PyTorch's mixed precision training (torch.cuda.amp.autocast)?
Hello, I discovered the application of this method in “ControlNet”, which is a "zero convolution", a trick to improve the effect
So why need to .detach()
?
The zero_grad() function is used to initialize certain modules to zero. I believe this initialization scheme was also used in
Denoising Diffusion Probabilistic Models (2020)
paper (e.g. here).
torch.cuda.amp
wasn't in a stable state when we started working on this project, but it is likely suitable to be used in a project like this by now. We don't want to change the code more than necessary since this is mostly an archive research codebase.
First of all, thanks for an extraordinary paper - so many interesting details!! Also, thanks for open sourcing the code.
I have a few ideas I want to test and I'm trying to understand all the parts of the code. Most of it is clear and well commented, but I can't seem to figure out the reasoning behind the 'zero_module' you have in a few places in the guided-diffusion/guided_diffusion/unet.py file?
I couldn't find anything in the paper or online to explain why this is used.
I'm also curious why you used a custom mixed precision training instead of using PyTorch's mixed precision training (torch.cuda.amp.autocast)?