Open andrewli77 opened 3 years ago
Hi @andrewli77,
Thank you for your PR! How would you define the value of requires_grad
?
Hi Lucas, sorry for the late reply. requires_grad
is an attribute for every parameter of nn.Module
which defaults to true. My code sets it by subclassing ACModel
and running
for param in self.rnn.parameters():
param.requires_grad = False
Hi @andrewli77,
I completely forgot about your PR. I am wondering if your modifications change something.
Your first modification:
grad_norm = sum(p.grad.data.norm(2).item() ** 2 for p in self.acmodel.parameters() if p.requires_grad) ** 0.5
If p.requires_grad
is false, then p.grad.data
would be an array of zeros, right? In this case, removing if p.requires_grad
would not change anything right?
Your second modification:
torch.nn.utils.clip_grad_norm_([p for p in self.acmodel.parameters() if p.requires_grad], self.max_grad_norm)
If p.requires_grad
is false, then p.grad.data
would be an array of zeros and therefore clipping would change nothing.
Hi Lucas, thanks for open-sourcing your code for this project. I think it would be nice to be able to manually freeze some network parameters (I was doing this in the context of transfer learning). Currently, the gradient norm is computed over parameters with
requires_grad=False
as well andp.grad
isNone
.