Closed xiayandi closed 3 years ago
Hi @xiayandi
As explained in the paper, gradient flow is stopped at sampling step between generator and discriminator. More precisely, even without torch.no_grad
, gradient computation will be interrupted at argmax
in self.sample
I use torch.no_grad
for two reasons
self.sample
before argmax
) by not doing its gradient calculation b/c eventually gradient calculation will be interrupted at argmax
.Please tag me if there is any thing else I can help you.
Hi Richard,
Thank you so much for the great work of this replication. It helps me a lot in understanding Electra in details.
I have one question regarding your use of "with no_grad" in: https://github.com/richarddwang/electra_pytorch/blob/master/pretrain.py#L294
In my understanding, all the tensors under the scope don't require grads. Is there any specific reason that you use with no_grad here?
Thanks!