richarddwang / electra_pytorch

Pretrain and finetune ELECTRA with fastai and huggingface. (Results of the paper replicated !)
324 stars 41 forks source link

Question about the use of with no grad #12

Closed xiayandi closed 3 years ago

xiayandi commented 3 years ago

Hi Richard,

Thank you so much for the great work of this replication. It helps me a lot in understanding Electra in details.

I have one question regarding your use of "with no_grad" in: https://github.com/richarddwang/electra_pytorch/blob/master/pretrain.py#L294

In my understanding, all the tensors under the scope don't require grads. Is there any specific reason that you use with no_grad here?

Thanks!

richarddwang commented 3 years ago

Hi @xiayandi

  1. As explained in the paper, gradient flow is stopped at sampling step between generator and discriminator. More precisely, even without torch.no_grad, gradient computation will be interrupted at argmax in self.sample

  2. I use torch.no_grad for two reasons

Please tag me if there is any thing else I can help you.