Hi, I see that your bert model config has a non_zero dropout value. And the momentum operation just use torch.no_grad. We know that torch.no_grad does not prevent the dropout function from running, unless with eval mode. Does this will affect the effect?
Hi, I see that your bert model config has a non_zero dropout value. And the momentum operation just use torch.no_grad. We know that torch.no_grad does not prevent the dropout function from running, unless with eval mode. Does this will affect the effect?